LEADER
BOARD

Ranked by highest gate cleared, then gated score within the reached gate based on passed core and scenario assertions. 125 runs across 32 scenarios.

Agent
Harness
Scenario
#1PRODUCTION

Ecommerce pipeline recovery

codex · classic-de

#2PRODUCTION

Idempotent pipeline

claude-code · classic-de

#3PRODUCTION

Idempotent pipeline

claude-code · classic-de

ALL RANKINGS125 RUNS
#SCENARIOHARNESSGATEGATED SCORETIMECOST
01Ecommerce pipeline recoverycodex · gpt-5.4classic-de
1.002m 31s$2.52
02Idempotent pipelineclaude-code · opus-4-6classic-de
1.001m 20s$0.36
03Idempotent pipelineclaude-code · sonnet-4-6classic-de
1.002m 33s$0.42
04Idempotent pipelinecodex · gpt-5.4classic-de
1.003m 35s$5.35
05Idempotent pipelinecursor · composer-2classic-de
1.004m 54s$0.57
06Ingest to apicursor · composer-2base-rt
1.003m 9s$0.20
07Metric definition backfillcodex · gpt-5.4classic-de
1.001m 30s$2.18
08Metric definition backfillcursor · composer-2classic-de
1.001m 28s$0.09
09Multi dimensional query buildercursor · composer-2classic-de
1.002m 47s$0.12
10Oltp to olap migrationclaude-code · opus-4-6classic-de
1.003m 11s$0.58
11Oltp to olap migrationclaude-code · sonnet-4-6classic-de
1.002m 6s$0.36
12Oltp to olap migrationcodex · gpt-5.4classic-de
1.004m 53s$9.75
13Oltp to olap migrationcursor · composer-2classic-de
1.002m 17s$0.23
14Quality gateclaude-code · sonnet-4-6classic-de
1.005m 51s$0.55
15Schema evolutionclaude-code · opus-4-6classic-de
1.001m 33s$0.31
16Schema evolutioncodex · gpt-5.4classic-de
1.003m 17s$4.56
17Transform chainclaude-code · opus-4-6classic-de
1.002m 49s$0.57
18Transform chainclaude-code · sonnet-4-6classic-de
1.003m 43s$0.48
19Transform chaincodex · gpt-5.4classic-de
1.003m 52s$4.27
20Broken connectioncursor · composer-2base-rt
0.992m 19s$0.15
21Broken connectioncursor · composerbase-rt
0.9913m 50s$4.65
22Cdc postgres to redpandaclaude-code · opus-4-6classic-de
0.992m 48s$0.67
23Cdc postgres to redpandaclaude-code · sonnet-4-6classic-de
0.993m 40s$0.59
24Clickhouse historical backfillclaude-code · opus-4-6classic-de
0.991m 0s$0.27
25Clickhouse historical backfillcodex · gpt-5.4classic-de
0.992m 31s$7.33
26Clickhouse live schema migrationclaude-code · opus-4-6classic-de
0.991m 0s$0.31
27Clickhouse live schema migrationclaude-code · sonnet-4-6classic-de
0.991m 47s$0.32
28Clickhouse live schema migrationcodex · gpt-5.4classic-de
0.991m 0s$0.61
29Clickhouse orderby optimizationcodex · gpt-5.4base-rt
0.991m 37s$2.69
30Clickhouse ttl lifecyclecodex · gpt-5.4base-rt
0.991m 18s$1.86
31Consistent metrics layerclaude-code · opus-4-6classic-de
0.991m 45s$0.36
32Consistent metrics layerclaude-code · sonnet-4-6classic-de
0.991m 37s$0.32
33Consistent metrics layercodex · gpt-5.4classic-de
0.991m 13s$2.34
34Consistent metrics layercursor · composer-2classic-de
0.991m 34s$0.11
35Csv ingestcursor · composer-2base-rt
0.992m 3s$0.10
36Csv ingestcursor · composerbase-rt
0.996m 25s$0.89
37Derived composite metricsclaude-code · opus-4-6base-rt
0.991m 54s$0.32
38Derived composite metricsclaude-code · sonnet-4-6base-rt
0.992m 4s$0.32
39Derived composite metricscursor · composer-2base-rt
0.991m 47s$0.11
40Event sourcing replayclaude-code · opus-4-6classic-de
0.992m 5s$0.37
41Metric definition backfillclaude-code · opus-4-6classic-de
0.992m 21s$0.48
42Metric definition backfillclaude-code · sonnet-4-6classic-de
0.992m 27s$0.34
43Multi dimensional query builderclaude-code · opus-4-6classic-de
0.993m 44s$0.97
44Paginated analytics apicursor · composer-2classic-de
0.992m 1s$0.07
45Postgres table partitioningcodex · gpt-5.4classic-de
0.992m 15s$0.84
46Schema evolutionclaude-code · sonnet-4-6classic-de
0.991m 29s$0.25
47Schema evolutioncursor · composer-2classic-de
0.992m 12s$0.13
48Slow queriescursor · composer-2base-rt
0.993m 19s$0.34
49Stream to olapcursor · composerclassic-de
0.991m 3s$3.41
50Table layoutclaude-code · sonnet-4-6base-rt
0.992m 25s$0.26
51Table layoutcodex · gpt-5.4base-rt
0.991m 57s$1.71
52Table layoutcursor · composer-2base-rt
0.991m 51s$0.12
53Time grain rollupsclaude-code · sonnet-4-6classic-de
0.993m 32s$0.21
54Time grain rollupscodex · gpt-5.4classic-de
0.994m 43s$6.71
55Transform chaincursor · composer-2classic-de
0.992m 53s$0.15
56Broken connectioncodex · gpt-5.4base-rt
0.973m 12s$1.70
57Clickhouse historical backfillclaude-code · sonnet-4-6classic-de
0.9726s$0.12
58Clickhouse historical backfillcursor · composer-2classic-de
0.971m 41s$0.18
59Clickhouse live schema migrationcursor · composer-2classic-de
0.971m 30s$0.12
60Clickhouse orderby optimizationclaude-code · opus-4-6base-rt
0.971m 33s$0.32
61Clickhouse orderby optimizationclaude-code · sonnet-4-6base-rt
0.971m 3s$0.19
62Clickhouse orderby optimizationcursor · composer-2base-rt
0.971m 34s$0.14
63Clickhouse ttl lifecycleclaude-code · opus-4-6base-rt
0.9736s$0.20
64Clickhouse ttl lifecycleclaude-code · sonnet-4-6base-rt
0.971m 12s$0.24
65Clickhouse ttl lifecyclecursor · composer-2base-rt
0.9755s$0.06
66Derived composite metricscodex · gpt-5.4base-rt
0.972m 34s$7.02
67Multi dimensional query builderclaude-code · sonnet-4-6classic-de
0.971m 48s$0.32
68Postgres index tuningcursor · composerbase-rt
0.9714m 50s$4.56
69Postgres readiness raceclaude-code · opus-4-6base-rt
0.9746s$0.20
70Postgres readiness racecodex · gpt-5.4base-rt
0.972m 2s$2.88
71Postgres readiness racecursor · composer-2base-rt
0.9755s$0.06
72Postgres table partitioningclaude-code · opus-4-6classic-de
0.971m 33s$0.25
73Postgres table partitioningclaude-code · sonnet-4-6classic-de
0.971m 43s$0.24
74Postgres table partitioningcursor · composer-2classic-de
0.971m 33s$0.06
75Realtime streaming metricsclaude-code · sonnet-4-6classic-de
0.974m 16s$0.60
76Slow analytical apiclaude-code · opus-4-6base-rt
0.972m 22s$0.43
77Slow analytical apiclaude-code · sonnet-4-6base-rt
0.973m 12s$0.53
78Slow analytical apicodex · gpt-5.4base-rt
0.972m 44s$5.49
79Slow queriesclaude-code · opus-4-6base-rt
0.971m 10s$0.25
80Slow queriesclaude-code · sonnet-4-6base-rt
0.971m 19s$0.17
81Slow queriescodex · gpt-5.4base-rt
0.973m 46s$2.08
82Stream to olapclaude-code · sonnet-4-6classic-de
0.972m 3s$0.30
83Stream to olapcursor · composer-2classic-de
0.974m 0s$0.19
84Time grain rollupsclaude-code · opus-4-6classic-de
0.973m 41s$0.31
85Time grain rollupscursor · composer-2classic-de
0.971m 12s$0.06
86Time grain rollupscursor · composerclassic-de
0.978m 9s$1.20
87Analytics api concurrencyclaude-code · opus-4-6classic-de
0.9654s$0.23
88Analytics api concurrencyclaude-code · sonnet-4-6classic-de
0.961m 30s$0.24
89Analytics api concurrencycursor · composer-2classic-de
0.962m 9s$0.19
90Cdc postgres to redpandacodex · gpt-5.4classic-de
0.966m 21s$8.24
91Consumer lag fixclaude-code · sonnet-4-6base-rt
0.962m 26s$0.54
92Slow analytical apicursor · composer-2base-rt
0.962m 12s$0.20
93Stream to olapcodex · gpt-5.4classic-de
0.962m 51s$2.71
94Broken connectionclaude-code · sonnet-4-6base-rt
0.953m 3s$0.17
95Ingest to apiclaude-code · opus-4-6base-rt
0.934m 48s$1.09
96Ingest to apiclaude-code · sonnet-4-6base-rt
0.932m 36s$0.31
97Paginated analytics apiclaude-code · opus-4-6classic-de
0.921m 32s$0.24
98Paginated analytics apiclaude-code · sonnet-4-6classic-de
0.911m 7s$0.16
99Csv ingestcodex · gpt-5.4base-rt
0.704m 4s$5.74
100Postgres index tuningclaude-code · opus-4-6base-rt
0.701m 52s$0.15
101Postgres index tuningcodex · gpt-5.4base-rt
0.703m 37s$2.46
102Postgres index tuningcursor · composer-2base-rt
0.704m 16s$0.27
103Ecommerce pipeline recoverycursor · composer-2classic-de
0.532m 57s$0.29
104Quality gateclaude-code · opus-4-6classic-de
0.532m 45s$0.49
105Quality gatecodex · gpt-5.4classic-de
0.533m 15s$4.49
106Quality gatecursor · composer-2classic-de
0.532m 47s$0.25
107Cross system reconciliationclaude-code · opus-4-6classic-de
0.453m 6s$0.80
108Ecommerce pipeline recoveryclaude-code · opus-4-6classic-de
0.301m 24s$0.30
109Ecommerce pipeline recoveryclaude-code · sonnet-4-6classic-de
0.301m 35s$0.26
110Realtime streaming metricscursor · composer-2classic-de
0.303m 11s$0.50
111Consumer lag fixcursor · composer-2base-rt
0.203m 15s$0.22
112Postgres index tuningclaude-code · sonnet-4-6base-rt
0.202m 11s$0.07
113Postgres readiness raceclaude-code · sonnet-4-6base-rt
0.2012s$0.03
114Ingest to apicodex · gpt-5.4base-rt
0.162m 43s$3.03
115Cross system reconciliationcodex · gpt-5.4classic-de
0.153m 44s$6.51
116Cross system reconciliationcursor · composerclassic-de
0.151m 36s$2.62
117Analytics api concurrencycodex · gpt-5.4classic-de
0.101m 59s$3.95
118Broken connectionclaude-code · opus-4-6base-rt
0.103m 52s$0.44
119Csv ingestclaude-code · opus-4-6base-rt
0.102m 45s$0.15
120Csv ingestclaude-code · sonnet-4-6base-rt
0.102m 42s$0.10
121Multi dimensional query buildercodex · gpt-5.4classic-de
0.102m 31s$2.84
122Paginated analytics apicodex · gpt-5.4classic-de
0.102m 18s$2.08
123Table layoutclaude-code · opus-4-6base-rt
0.101m 7s$0.25
124Cross system reconciliationclaude-code · sonnet-4-6classic-de
0.004m 0s
125Stream to olapclaude-code · opus-4-6classic-de
0.0010m 0s

Want to benchmark your agent or collaborate on the preview?