DEC Bench
Run evidence, gate breakdowns, and full agent traces for every evaluated scenario.
Restore and harden a broken daily orders pipeline from Postgres into ClickHouse.
Runs
4
Agent
cursor
Model
composer-2
Fix an analytics HTTP API that fails or returns inconsistent results under concurrent requests. The API uses a single shared ClickHouse connection that is not thread-safe.
Runs
4
Agent
cursor
Model
composer-2
Diagnose and fix a misconfigured Postgres setup: wrong connection string, partial init script, and hardcoded credentials in a Python script.
Runs
5
Agent
cursor
Model
composer-2
Build a change-data-capture pipeline from Postgres to Redpanda: capture table changes and stream them to a Kafka topic for downstream consumers.
Runs
3
Agent
codex
Model
gpt-5.4
Fix a broken historical backfill pipeline. The target table has incorrect partition or order keys that cause slow inserts and failed backfills for historical date ranges.
Runs
4
Agent
cursor
Model
composer-2
Add a new column to a live ClickHouse table with existing data. The migration must not block reads/writes and must backfill the new column for existing rows.
Runs
4
Agent
cursor
Model
composer-2
A ClickHouse table has a suboptimal ORDER BY that causes full scans for common query patterns. Redesign the ORDER BY key to match filter columns (region, event_ts) and improve query latency.
Runs
4
Agent
cursor
Model
composer-2
Design a ClickHouse table with TTL so that raw events older than 90 days are automatically dropped. Create the table and verify the TTL expression is correct.
Runs
4
Agent
cursor
Model
composer-2
Fix inconsistent metric definitions across ClickHouse and Postgres. The same metric (e.g. daily_active_users) is computed differently in each store, causing reporting mismatches.
Runs
4
Agent
cursor
Model
composer-2
Diagnose and fix a Kafka consumer that cannot keep up with producer throughput. Consumer lag is growing; optimize batch size, parallelism, or fetch config so the sink table catches up.
Runs
2
Agent
cursor
Model
composer-2
Fix broken reconciliation between Postgres, Redpanda, and ClickHouse. Build a reusable reconciliation command that detects drift, writes a structured drift report, and fails with meaningful exit codes when systems are unavailable.
Runs
4
Agent
codex
Model
gpt-5.4
Load five messy CSV files into clean ClickHouse tables. Handle inconsistent dates, mixed null representations, duplicate headers, and trailing delimiters.
Runs
5
Agent
cursor
Model
composer-2
Create a semantic layer with derived and composite metrics. Build materialized views or a metrics table that computes conversion_rate, avg_order_value, and daily_active_users from raw events.
Runs
4
Agent
cursor
Model
composer-2
Fix a broken event-sourcing setup: event log in Redpanda is out of sync with materialized view in Postgres. Replay events to restore consistency.
Runs
1
Agent
claude-code
Model
opus-4-6
Build a pipeline from Postgres to ClickHouse that produces identical results whether run once or three times. Handle updates to existing rows by primary key.
Runs
4
Agent
cursor
Model
composer-2
Build a complete pipeline: ingest raw product-interaction events from Postgres, transform and load into ClickHouse, and expose a lightweight HTTP API that serves pre-aggregated insights (top products, conversion funnel, hourly trends).
Runs
4
Agent
cursor
Model
composer-2
Fix a failed backfill of a new metric definition. The backfill job partially ran, left orphaned data, and the metric table has gaps or duplicates. Recover and complete the backfill idempotently.
Runs
4
Agent
cursor
Model
composer-2
Build a query builder API that supports filtering and grouping by multiple dimensions (date, region, product). Choose appropriate storage (ClickHouse vs Postgres) per data type and expose a unified query interface.
Runs
4
Agent
claude-code
Model
sonnet-4-6
Fix a broken migration from Postgres (OLTP) to ClickHouse (OLAP). The sync job fails, schema is mismatched, and incremental load is not working. Restore the migration pipeline.
Runs
4
Agent
claude-code
Model
opus-4-6
Build a paginated analytics API that serves large result sets from ClickHouse with cursor-based or offset pagination. Support page size and consistent ordering.
Runs
4
Agent
cursor
Model
composer-2
Add indexes to a Postgres table with 500k rows. Queries filtering by user_id and created_at are slow; create appropriate indexes to meet latency targets.
Runs
5
Agent
cursor
Model
composer-2
Fix a startup race: the app connects to Postgres immediately and fails when the database is not yet ready. Add retry/backoff logic so the app waits for Postgres to be ready.
Runs
4
Agent
cursor
Model
composer-2
Fix a large unpartitioned Postgres table that is slow for date-range queries. Partition by date and migrate existing data.
Runs
4
Agent
codex
Model
gpt-5.4
Add data quality checks to a working pipeline. A chaos script injects nulls, duplicates, schema drift, and stale timestamps. Agent must detect all four problems without false positives.
Runs
4
Agent
cursor
Model
composer-2
Fix a broken realtime metrics pipeline: Redpanda streams metrics but ClickHouse materialized view is stale or failing. Restore sub-second aggregation from stream to OLAP.
Runs
2
Agent
claude-code
Model
sonnet-4-6
Add a column to a synthetic dimension table, create the corresponding ClickHouse target, build a cross-database view, and verify old queries still work.
Runs
4
Agent
cursor
Model
composer-2
An HTTP API serves analytical queries from ClickHouse but responses take 2+ seconds. Optimize the queries, add materialized views, or pre-aggregate so /api/metrics and /api/breakdown return in under 200ms.
Runs
4
Agent
cursor
Model
composer-2
Rewrite five slow analytical queries against a 10M-row ClickHouse table to run under latency thresholds without changing result sets.
Runs
4
Agent
cursor
Model
composer-2
Consume events from a Redpanda topic and load them into ClickHouse for analytical queries. Build a streaming-to-OLAP pipeline with appropriate table design.
Runs
5
Agent
cursor
Model
composer-2
Redesign a naive ClickHouse table with 5M rows to serve three representative query patterns. Set partition keys, order keys, and compression for target latencies.
Runs
4
Agent
cursor
Model
composer-2
Fix broken time-grain rollup tables (hourly, daily, monthly) that have incorrect aggregations or missing grains. The rollups should support dashboard queries at different time resolutions.
Runs
5
Agent
cursor
Model
composer-2
Build a three-layer transformation from raw JSON events in Postgres to a sessionized, aggregated daily mart in ClickHouse.
Runs
4
Agent
cursor
Model
composer-2