DEC Bench

Audit

Run evidence, gate breakdowns, and full agent traces for every evaluated scenario.

E-commerce Pipeline Recovery

Restore and harden a broken daily orders pipeline from Postgres into ClickHouse.

2/553%

Runs

4

Agent

cursor

Model

composer-2

Feb 7, 03:17 PM4 runs
Foo Bar Analytics API Concurrency

Fix an analytics HTTP API that fails or returns inconsistent results under concurrent requests. The API uses a single shared ClickHouse connection that is not thread-safe.

4/596%

Runs

4

Agent

cursor

Model

composer-2

Feb 13, 04:35 AM4 runs
Foo Bar Broken Connection

Diagnose and fix a misconfigured Postgres setup: wrong connection string, partial init script, and hardcoded credentials in a Python script.

4/599%

Runs

5

Agent

cursor

Model

composer-2

Feb 14, 07:16 PM5 runs
Foo Bar CDC Postgres to Redpanda

Build a change-data-capture pipeline from Postgres to Redpanda: capture table changes and stream them to a Kafka topic for downstream consumers.

4/596%

Runs

3

Agent

codex

Model

gpt-5.4

Feb 21, 10:46 AM3 runs
Foo Bar ClickHouse Historical Backfill

Fix a broken historical backfill pipeline. The target table has incorrect partition or order keys that cause slow inserts and failed backfills for historical date ranges.

4/597%

Runs

4

Agent

cursor

Model

composer-2

Mar 5, 05:49 PM4 runs
Foo Bar ClickHouse Live Schema Migration

Add a new column to a live ClickHouse table with existing data. The migration must not block reads/writes and must backfill the new column for existing rows.

4/597%

Runs

4

Agent

cursor

Model

composer-2

Mar 9, 10:40 PM4 runs
Foo Bar ClickHouse ORDER BY Optimization

A ClickHouse table has a suboptimal ORDER BY that causes full scans for common query patterns. Redesign the ORDER BY key to match filter columns (region, event_ts) and improve query latency.

4/597%

Runs

4

Agent

cursor

Model

composer-2

Mar 14, 05:10 PM4 runs
Foo Bar ClickHouse TTL Lifecycle

Design a ClickHouse table with TTL so that raw events older than 90 days are automatically dropped. Create the table and verify the TTL expression is correct.

4/597%

Runs

4

Agent

cursor

Model

composer-2

Mar 18, 08:49 AM4 runs
Foo Bar Consistent Metrics Layer

Fix inconsistent metric definitions across ClickHouse and Postgres. The same metric (e.g. daily_active_users) is computed differently in each store, causing reporting mismatches.

4/599%

Runs

4

Agent

cursor

Model

composer-2

Mar 22, 03:22 PM4 runs
Foo Bar Consumer Lag Fix

Diagnose and fix a Kafka consumer that cannot keep up with producer throughput. Consumer lag is growing; optimize batch size, parallelism, or fetch config so the sink table catches up.

1/520%

Runs

2

Agent

cursor

Model

composer-2

Jun 7, 03:45 PM2 runs
Foo Bar Cross-System Reconciliation

Fix broken reconciliation between Postgres, Redpanda, and ClickHouse. Build a reusable reconciliation command that detects drift, writes a structured drift report, and fails with meaningful exit codes when systems are unavailable.

0/515%

Runs

4

Agent

codex

Model

gpt-5.4

Sep 27, 08:17 AM4 runs
Foo Bar CSV Ingest

Load five messy CSV files into clean ClickHouse tables. Handle inconsistent dates, mixed null representations, duplicate headers, and trailing delimiters.

4/599%

Runs

5

Agent

cursor

Model

composer-2

Jun 7, 04:52 PM5 runs
Foo Bar Derived Composite Metrics

Create a semantic layer with derived and composite metrics. Build materialized views or a metrics table that computes conversion_rate, avg_order_value, and daily_active_users from raw events.

4/599%

Runs

4

Agent

cursor

Model

composer-2

Jun 9, 02:48 AM4 runs
Foo Bar Event Sourcing Replay

Fix a broken event-sourcing setup: event log in Redpanda is out of sync with materialized view in Postgres. Replay events to restore consistency.

4/599%

Runs

1

Agent

claude-code

Model

opus-4-6

Jun 10, 12:03 AM
Foo Bar Idempotent Pipeline

Build a pipeline from Postgres to ClickHouse that produces identical results whether run once or three times. Handle updates to existing rows by primary key.

5/5100%

Runs

4

Agent

cursor

Model

composer-2

Jun 12, 02:05 AM4 runs
Foo Bar Ingest-to-Insights API

Build a complete pipeline: ingest raw product-interaction events from Postgres, transform and load into ClickHouse, and expose a lightweight HTTP API that serves pre-aggregated insights (top products, conversion funnel, hourly trends).

5/5100%

Runs

4

Agent

cursor

Model

composer-2

Jun 12, 02:39 AM4 runs
Foo Bar Metric Definition Backfill

Fix a failed backfill of a new metric definition. The backfill job partially ran, left orphaned data, and the metric table has gaps or duplicates. Recover and complete the backfill idempotently.

5/5100%

Runs

4

Agent

cursor

Model

composer-2

Jun 14, 01:22 AM4 runs
Foo Bar Multi-Dimensional Query Builder

Build a query builder API that supports filtering and grouping by multiple dimensions (date, region, product). Choose appropriate storage (ClickHouse vs Postgres) per data type and expose a unified query interface.

4/597%

Runs

4

Agent

claude-code

Model

sonnet-4-6

Jun 15, 04:37 AM4 runs
Foo Bar OLTP to OLAP Migration

Fix a broken migration from Postgres (OLTP) to ClickHouse (OLAP). The sync job fails, schema is mismatched, and incremental load is not working. Restore the migration pipeline.

5/5100%

Runs

4

Agent

claude-code

Model

opus-4-6

Jun 16, 12:42 AM4 runs
Foo Bar Paginated Analytics API

Build a paginated analytics API that serves large result sets from ClickHouse with cursor-based or offset pagination. Support page size and consistent ordering.

4/599%

Runs

4

Agent

cursor

Model

composer-2

Jun 17, 01:39 PM4 runs
Foo Bar Postgres Index Tuning

Add indexes to a Postgres table with 500k rows. Queries filtering by user_id and created_at are slow; create appropriate indexes to meet latency targets.

3/570%

Runs

5

Agent

cursor

Model

composer-2

Jun 17, 04:41 PM5 runs
Foo Bar Postgres Readiness Race

Fix a startup race: the app connects to Postgres immediately and fails when the database is not yet ready. Add retry/backoff logic so the app waits for Postgres to be ready.

4/597%

Runs

4

Agent

cursor

Model

composer-2

Jun 18, 07:31 AM4 runs
Foo Bar Postgres Table Partitioning

Fix a large unpartitioned Postgres table that is slow for date-range queries. Partition by date and migrate existing data.

4/599%

Runs

4

Agent

codex

Model

gpt-5.4

Jun 19, 06:35 AM4 runs
Foo Bar Quality Gate

Add data quality checks to a working pipeline. A chaos script injects nulls, duplicates, schema drift, and stale timestamps. Agent must detect all four problems without false positives.

2/553%

Runs

4

Agent

cursor

Model

composer-2

Jun 20, 10:50 AM4 runs
Foo Bar Realtime Streaming Metrics

Fix a broken realtime metrics pipeline: Redpanda streams metrics but ClickHouse materialized view is stale or failing. Restore sub-second aggregation from stream to OLAP.

4/597%

Runs

2

Agent

claude-code

Model

sonnet-4-6

Jun 21, 06:12 AM2 runs
Foo Bar Schema Evolution

Add a column to a synthetic dimension table, create the corresponding ClickHouse target, build a cross-database view, and verify old queries still work.

4/599%

Runs

4

Agent

cursor

Model

composer-2

Jun 22, 05:57 PM4 runs
Foo Bar Slow Analytical API

An HTTP API serves analytical queries from ClickHouse but responses take 2+ seconds. Optimize the queries, add materialized views, or pre-aggregate so /api/metrics and /api/breakdown return in under 200ms.

4/596%

Runs

4

Agent

cursor

Model

composer-2

Jun 23, 08:03 PM4 runs
Foo Bar Slow Queries

Rewrite five slow analytical queries against a 10M-row ClickHouse table to run under latency thresholds without changing result sets.

4/599%

Runs

4

Agent

cursor

Model

composer-2

Jun 25, 12:37 AM4 runs
Foo Bar Stream to OLAP

Consume events from a Redpanda topic and load them into ClickHouse for analytical queries. Build a streaming-to-OLAP pipeline with appropriate table design.

4/597%

Runs

5

Agent

cursor

Model

composer-2

Jun 26, 04:51 AM5 runs
Foo Bar Table Layout

Redesign a naive ClickHouse table with 5M rows to serve three representative query patterns. Set partition keys, order keys, and compression for target latencies.

4/599%

Runs

4

Agent

cursor

Model

composer-2

Jun 27, 06:21 AM4 runs
Foo Bar Time Grain Rollups

Fix broken time-grain rollup tables (hourly, daily, monthly) that have incorrect aggregations or missing grains. The rollups should support dashboard queries at different time resolutions.

4/597%

Runs

5

Agent

cursor

Model

composer-2

Jun 28, 01:30 AM5 runs
Foo Bar Transform Chain

Build a three-layer transformation from raw JSON events in Postgres to a sessionized, aggregated daily mart in ClickHouse.

4/599%

Runs

4

Agent

cursor

Model

composer-2

Jun 29, 10:20 AM4 runs