Connexity

Data Model — connexity

Data Model — connexity

Entity Relationship Diagram

erDiagram
    Agent ||--o{ Run : "tested in"
    EvalSet ||--o{ Run : "evaluated by"
    EvalSet ||--o{ EvalSetMember : "contains"
    TestCase ||--o{ EvalSetMember : "belongs to"
    Run ||--o{ TestCaseResult : "produces"
    TestCase ||--o{ TestCaseResult : "evaluated in"
 
    Agent {
        uuid id PK
        string name
        string description
        enum mode
        string endpoint_url
        text system_prompt
        jsonb tools
        string agent_model
        string agent_provider
        jsonb metadata
        timestamp created_at
        timestamp updated_at
    }
 
    TestCase {
        uuid id PK
        string name
        string description
        enum difficulty
        text_array tags
        enum status
        jsonb persona
        string initial_message
        jsonb user_context
        int max_turns
        jsonb expected_outcomes
        jsonb expected_tool_calls
        string evaluation_criteria_override
        timestamp created_at
        timestamp updated_at
    }
 
    EvalSet {
        uuid id PK
        string name
        string description
        int version
        timestamp created_at
        timestamp updated_at
    }
 
    EvalSetMember {
        uuid eval_set_id FK
        uuid test_case_id FK
        int position
    }
 
    Run {
        uuid id PK
        string name
        uuid agent_id FK
        string agent_endpoint_url
        text agent_system_prompt
        jsonb agent_tools
        string agent_mode
        string agent_model
        string agent_provider
        uuid eval_set_id FK
        int eval_set_version
        jsonb config
        enum status
        bool is_baseline
        jsonb aggregate_metrics
        timestamp started_at
        timestamp completed_at
        timestamp created_at
        timestamp updated_at
    }
 
    TestCaseResult {
        uuid id PK
        uuid run_id FK
        uuid test_case_id FK
        jsonb transcript
        int turn_count
        jsonb verdict
        int total_latency_ms
        int agent_latency_p50_ms
        int agent_latency_p95_ms
        int agent_latency_max_ms
        jsonb agent_token_usage
        jsonb platform_token_usage
        float estimated_cost_usd
        bool passed
        text error_message
        timestamp started_at
        timestamp completed_at
        timestamp created_at
        timestamp updated_at
    }

Enums

EnumValues
Difficultynormal, hard
TestCaseStatusdraft, active, archived
RunStatuspending, running, completed, failed, cancelled
TurnRoleuser, assistant, system, tool
AgentModeendpoint, platform

JSONB Nested Entities

These are stored inside JSONB columns, not as separate tables.

RunConfig (stored in runs.config)

FieldTypeDefault
concurrencyint5
timeout_per_test_case_msint120000
judgeJudgeConfig | NoneNone
user_simulatorUserSimulatorConfig | NoneNone
agent_simulatorAgentSimulatorConfig | NoneNone

JudgeConfig (nested in RunConfig.judge)

FieldTypeDefault
metricslist[MetricSelection] | NoneNone
pass_thresholdfloat75.0
modelstr | NoneNone
providerstr | NoneNone

UserSimulatorConfig (nested in RunConfig.user_simulator)

FieldTypeDefault
modeSimulatorModellm
scripted_messageslist[str][]
modelstr | NoneNone
providerstr | NoneNone
temperaturefloat | NoneNone

AgentSimulatorConfig (nested in RunConfig.agent_simulator)

FieldTypeDefault
modelstr | NoneNone
providerstr | NoneNone
temperaturefloat | NoneNone
max_tokensint | NoneNone

ConversationTurn (stored in test_case_result.transcript)

FieldType
indexint
roleTurnRole
contentstr | None
tool_callslist[ToolCall] | None
tool_call_idstr | None
latency_msint | None
token_countint | None
timestampdatetime

ToolCall (nested in ConversationTurn.tool_calls)

OpenAI chat-completions shape, plus optional tool_result for platform-stored outcomes.

FieldType
idstr
typefunction
functionToolCallFunction (name, arguments JSON string)
tool_resultAny | None

JudgeVerdict (stored in test_case_result.verdict)

FieldTypeDefault
passedbool
overall_scorefloat
metric_scoreslist[MetricScore]
summarystr | NoneNone
raw_judge_outputstr | NoneNone
judge_modelstr
judge_providerstr
judge_latency_msint | NoneNone
judge_token_usagedict[str, int] | NoneNone

MetricScore (nested in JudgeVerdict.metric_scores)

FieldTypeDefault
metricstr
scoreint— (0–5 scored; 0 or 5 binary)
labelstr— (critical_fail|fail|poor|acceptable|good|excellent / pass|fail)
weightfloat1.0
justificationstr
is_binaryboolfalse
tierstr | NoneNone
failure_codestr | NoneNone — judge-generated label when metric scored poorly
turnslist[int][] — turn indices where the issue was observed

AggregateMetrics (stored in runs.aggregate_metrics)

FieldTypeDefault
unique_test_case_countint
total_executionsint
passed_countint
failed_countint
error_countint
pass_ratefloat
latency_p50_msfloat | NoneNone
latency_p95_msfloat | NoneNone
latency_max_msfloat | NoneNone
latency_avg_msfloat | NoneNone
total_agent_token_usagedict[str, int] | NoneNone
total_platform_token_usagedict[str, int] | NoneNone
total_estimated_cost_usdfloat | NoneNone
avg_overall_scorefloat | NoneNone

Persona (stored in test_case.persona)

FieldType
typestr
descriptionstr
instructionsstr

ExpectedToolCall (stored in test_case.expected_tool_calls)

FieldTypeDefault
toolstr
expected_paramsdict[str, Any] | NoneNone

expected_outcomes (stored in test_case.expected_outcomes)

Free-form dict[str, Any]. Keys are descriptive labels (e.g. "refund_initiated"), values are expected state (bool, string, etc.). The judge interprets these semantically.

Indexes

TableIndexType
test_casedifficultybtree
test_casestatusbtree
test_casetagsGIN
eval_setnamebtree
eval_set_membereval_set_idbtree
runagent_idbtree
runeval_set_idbtree
runstatusbtree
runis_baselinebtree
runcreated_atbtree
test_case_resultrun_idbtree
test_case_resulttest_case_idbtree
test_case_resultpassedbtree

Critical Design Decision

agent_system_prompt, agent_tools, and related agent snapshot fields live on the Run entity (captured at eval time), NOT on TestCase. Each run also records agent_version / agent_version_id pointing at the immutable AgentVersion row. This ensures that each evaluation run captures a complete snapshot of the agent configuration at that point in time.