LeaderboardMethodology
Leaderboard · chi-bench-v1.0.0
CHI-Bench Leaderboard
Χ-Bench evaluates long-horizon, policy-rich U.S. healthcare workflow agents across three domains: provider prior authorization, payer utilization management, and care management. Each domain ships 25 tasks scored by an automated workspace judge under pass@1 with a binary 0/1 reward. Submissions below are ranked by accuracy on the selected domain.
Org
Agent ↕
Model ↕
Type ↕
Accuracy ▼
PA
UM
CM
Evidence
Date ↕
Submissions ranked by pass@1 on All Domains. Click any column header to sort.
Got results?
Submit your agent to the CHI-Bench leaderboard.
Run the evaluation suite with your harness/model, prepare a packet with cb submission prepare, and open a PR. CI re-runs the validator and a maintainer merges within one business day.