DOCUMENTATION

Core

Core roadmap items for DEC Bench.

IN PROGRESSv0.1 — Internal Release

36 scenarios on the Foo Bar synthetic domain (13 tier-1, 18 tier-2, 5 tier-3), 3 harness configurations, installable CLI binaries, localhost audit flow, and gated scoring.

UP NEXTv0.2
Real Business Domains

B2B SaaS, E-commerce, and UGC domains with production-realistic data.

Leaderboard

Public leaderboard with results from eval runs across all agents and harnesses.

Multi-Agent Harnesses

Add Codex and Aider alongside Claude Code for direct comparison.

Tier 2-3 Scenarios

Multi-task scenarios requiring investigation and architecture decisions.

FUTURE
Multi-Engine Support

DuckDB and Snowflake environments alongside ClickHouse.

Streaming Evals

Real-time streaming pipeline correctness and throughput benchmarks.

DEC Bench Verified

Human-validated subset of ~100 scenarios with expert DBA review.

Community Contributions

Open scenario submission pipeline with automated validation.