DOCUMENTATION
Core
Core roadmap items for DEC Bench.
IN PROGRESSv0.1 — Internal Release
36 scenarios on the Foo Bar synthetic domain (13 tier-1, 18 tier-2, 5 tier-3), 3 harness configurations, installable CLI binaries, localhost audit flow, and gated scoring.
UP NEXTv0.2
Real Business Domains
B2B SaaS, E-commerce, and UGC domains with production-realistic data.
Leaderboard
Public leaderboard with results from eval runs across all agents and harnesses.
Multi-Agent Harnesses
Add Codex and Aider alongside Claude Code for direct comparison.
Tier 2-3 Scenarios
Multi-task scenarios requiring investigation and architecture decisions.
FUTURE
Multi-Engine Support
DuckDB and Snowflake environments alongside ClickHouse.
Streaming Evals
Real-time streaming pipeline correctness and throughput benchmarks.
DEC Bench Verified
Human-validated subset of ~100 scenarios with expert DBA review.
Community Contributions
Open scenario submission pipeline with automated validation.