Bring your own agent

The same submission flow works for built-in and custom agents — only the harness wiring differs.

Built-in agents

`--agent`	Example `--model`	Paper rows
`claude-code`	`anthropic/claude-opus-4-7`	Claude Code
`codex`	`openai/gpt-5.5`	Codex
`gemini-cli`	`gemini/gemini-3-pro-preview`	Gemini CLI
`openclaw`	`anthropic/claude-opus-4-7`	OpenClaw
`hermes`	`openrouter/z-ai/glm-5.1`	Hermes
`openai-agents`	`deepseek/deepseek-v4-pro`	OAI Agents
`deepagents`	`openrouter/x-ai/grok-4.3`	DeepAgents

The full 30-row matrix lives in configs/experiments/table1_main_matrix.yaml.

What an agent harness needs to provide

A harness is a Python class under src/chi_bench/experiment/agents/ that implements three things:

A constructor that receives the per-task instruction.md, the role-scoped MCP server URL, the model identifier, and any provider credentials.
An async run() that drives the agent loop until it terminates (success, failure, or budget exhausted).
A trajectory writer that emits one JSONL record per step into agent/trajectory.jsonl following the ATIF schema (Agent Trajectory Interchange Format).

Wiring a new harness

Create src/chi_bench/experiment/agents/<your_agent>.py, subclassing the base AgentHarness. Point at the MCP server using the URL passed in via constructor.
Register it in src/chi_bench/experiment/agents/__init__.py so --agent <your_agent> resolves it on the CLI.
Smoke-test with cb experiment run --agent <your_agent> --model ...on a single task; confirm verifier/scorecard.json reads.
Run a full submission via cb submission run -f configs/submissions/<id>.yaml.

Custom model endpoints

Most harnesses route through provider SDKs (Anthropic, OpenAI, Google, OpenRouter) keyed by the --model prefix. To add a new provider:

Add a model-resolver entry that maps <provider>/<model-id> to a client construction.
Add the provider's API key handling to .env.example and document it in the README.
If the endpoint is OpenAI-compatible, you can usually reuse the existing codex or openai-agents harnesses by setting OPENAI_BASE_URL appropriately — minus the judge subprocess, which always uses the real Anthropic API.

Authoritative docs

docs/extending.md — full walkthrough with code examples.
docs/cli.md — every CLI flag and exit-code convention.
Up next: Submit your agent to the leaderboard

The packet shape is identical regardless of whether you submit a built-in agent or a custom one. The leaderboard PR flow doesn't care how the trials were produced, only that the manifest, results CSV, and per-trial evidence pass the validator.