connexity-cli
connexity-cli
Command-line client for Connexity — drive eval runs, manage agents and test cases, and gate CI on regressions, all from the terminal.
connexity-cli is a thin wrapper over the Connexity REST API. It covers the public surface used to drive eval workflows from CI: auth, agents, eval configs, test cases, runs (with SSE streaming), custom metrics, prompt editor, integrations, environments (including deploy + deployment history), calls, config, and health. Account self-service (signup, password reset) stays in the web UI.
Installation
The wheel pulls in only click, httpx, and httpx-sse — no FastAPI, no SQLModel, no LLM SDKs.
Authentication
The CLI authenticates against a Connexity API server using a Bearer JWT.
| Source | When used |
|---|---|
--token / --api-url flags | Highest precedence — explicit per-invocation |
CONNEXITY_CLI_API_TOKEN / CONNEXITY_CLI_API_URL env vars | Typical CI usage |
~/.config/connexity-cli/credentials.json (mode 0600) | Written by connexity-cli login --save |
Quick start
Authoring patterns
Every command that creates or updates a resource takes a single --from-file PATH (or --from-file - for stdin) with a JSON body that matches the backend Pydantic schema (e.g. AgentCreate, RunCreate, EvalConfigCreate, CustomMetricCreate). The CLI does no schema duplication — the server validates and returns clear errors.
Pass/fail thresholds
Every run carries two run-level pass/fail dimensions, snapshotted from the eval config and overridable per run:
| Threshold | Meaning | Default |
|---|---|---|
metrics_pass_threshold | Weighted average of the judge overall_score across cases that produced a verdict (0-100) | 80 |
cases_pass_threshold | Fraction of cases that pass / total executions, errored cases counting as not-passed (0-100) | 100 |
connexity-cli run and connexity-cli compare gate their exit code on these by default. Override per invocation:
Pass --no-fail-on-thresholds to print the verdict but exit 0 regardless. Full formula and rationale: docs/scoring-and-thresholds.md.
Output formats
Two formats are supported, switchable per-command via --output or globally via --output on the root group:
table(default) — human-readable tables with auto-detected column widthsjson— pretty-printed JSON, friendly tojq/gron/ scripting
Command tree
Each top-level group mirrors a backend router:
| Group | Purpose |
|---|---|
login / logout / whoami | Auth & session |
agents | CRUD, draft/publish/rollback, versions, version diff, guidelines |
eval-configs | CRUD, member (test-case) management |
test-cases | CRUD, bulk import/export, generate, AI editor |
test-case-results | Per-test-case run result CRUD |
runs | CRUD, execute, cancel, stream (SSE), baselines, compare, suggestions |
custom-metrics | CRUD plus LLM-backed metric preview generation |
prompt-editor | Sessions, messages, presets, streaming chat |
integrations | Third-party providers (Retell), connection test, list provider-side agents |
environments | Bindings + deploy, retell-versions, deployments list (history) |
calls | Observed external calls (Retell), refresh / mark-seen |
config | Read-only API metadata, available metrics, LLM models |
health | Server health probe |
run / compare / baseline | Top-level convenience wrappers for common one-shot CI workflows |
Run connexity-cli <group> --help (or connexity-cli <group> <subcommand> --help) to see flags and arguments.
Subcommand reference (selected)
Not exhaustive — run --help for the full set. These are the commands you'll reach for in CI and day-to-day work:
Exit codes
0— success1— operation completed but indicates failure: run failed / cancelled, regression detected, candidate failed its metrics or cases threshold (default-on, opt out with--no-fail-on-thresholds), deploy returnedstatus=failed, orimportreturned errors2— argument / configuration error, timeout, network failure
License
MIT — see LICENSE.