Ranked by highest gate cleared, then gated score within the reached gate based on passed core and scenario assertions. 125 runs across 32 scenarios.
| # | SCENARIO | HARNESS | GATE | GATED SCORE | TIME | COST |
|---|---|---|---|---|---|---|
| 01 | Ecommerce pipeline recoverycodex · gpt-5.4 | classic-de | 1.00 | 2m 31s | $2.52 | |
| 02 | Idempotent pipelineclaude-code · opus-4-6 | classic-de | 1.00 | 1m 20s | $0.36 | |
| 03 | Idempotent pipelineclaude-code · sonnet-4-6 | classic-de | 1.00 | 2m 33s | $0.42 | |
| 04 | Idempotent pipelinecodex · gpt-5.4 | classic-de | 1.00 | 3m 35s | $5.35 | |
| 05 | Idempotent pipelinecursor · composer-2 | classic-de | 1.00 | 4m 54s | $0.57 | |
| 06 | Ingest to apicursor · composer-2 | base-rt | 1.00 | 3m 9s | $0.20 | |
| 07 | Metric definition backfillcodex · gpt-5.4 | classic-de | 1.00 | 1m 30s | $2.18 | |
| 08 | Metric definition backfillcursor · composer-2 | classic-de | 1.00 | 1m 28s | $0.09 | |
| 09 | Multi dimensional query buildercursor · composer-2 | classic-de | 1.00 | 2m 47s | $0.12 | |
| 10 | Oltp to olap migrationclaude-code · opus-4-6 | classic-de | 1.00 | 3m 11s | $0.58 | |
| 11 | Oltp to olap migrationclaude-code · sonnet-4-6 | classic-de | 1.00 | 2m 6s | $0.36 | |
| 12 | Oltp to olap migrationcodex · gpt-5.4 | classic-de | 1.00 | 4m 53s | $9.75 | |
| 13 | Oltp to olap migrationcursor · composer-2 | classic-de | 1.00 | 2m 17s | $0.23 | |
| 14 | Quality gateclaude-code · sonnet-4-6 | classic-de | 1.00 | 5m 51s | $0.55 | |
| 15 | Schema evolutionclaude-code · opus-4-6 | classic-de | 1.00 | 1m 33s | $0.31 | |
| 16 | Schema evolutioncodex · gpt-5.4 | classic-de | 1.00 | 3m 17s | $4.56 | |
| 17 | Transform chainclaude-code · opus-4-6 | classic-de | 1.00 | 2m 49s | $0.57 | |
| 18 | Transform chainclaude-code · sonnet-4-6 | classic-de | 1.00 | 3m 43s | $0.48 | |
| 19 | Transform chaincodex · gpt-5.4 | classic-de | 1.00 | 3m 52s | $4.27 | |
| 20 | Broken connectioncursor · composer-2 | base-rt | 0.99 | 2m 19s | $0.15 | |
| 21 | Broken connectioncursor · composer | base-rt | 0.99 | 13m 50s | $4.65 | |
| 22 | Cdc postgres to redpandaclaude-code · opus-4-6 | classic-de | 0.99 | 2m 48s | $0.67 | |
| 23 | Cdc postgres to redpandaclaude-code · sonnet-4-6 | classic-de | 0.99 | 3m 40s | $0.59 | |
| 24 | Clickhouse historical backfillclaude-code · opus-4-6 | classic-de | 0.99 | 1m 0s | $0.27 | |
| 25 | Clickhouse historical backfillcodex · gpt-5.4 | classic-de | 0.99 | 2m 31s | $7.33 | |
| 26 | Clickhouse live schema migrationclaude-code · opus-4-6 | classic-de | 0.99 | 1m 0s | $0.31 | |
| 27 | Clickhouse live schema migrationclaude-code · sonnet-4-6 | classic-de | 0.99 | 1m 47s | $0.32 | |
| 28 | Clickhouse live schema migrationcodex · gpt-5.4 | classic-de | 0.99 | 1m 0s | $0.61 | |
| 29 | Clickhouse orderby optimizationcodex · gpt-5.4 | base-rt | 0.99 | 1m 37s | $2.69 | |
| 30 | Clickhouse ttl lifecyclecodex · gpt-5.4 | base-rt | 0.99 | 1m 18s | $1.86 | |
| 31 | Consistent metrics layerclaude-code · opus-4-6 | classic-de | 0.99 | 1m 45s | $0.36 | |
| 32 | Consistent metrics layerclaude-code · sonnet-4-6 | classic-de | 0.99 | 1m 37s | $0.32 | |
| 33 | Consistent metrics layercodex · gpt-5.4 | classic-de | 0.99 | 1m 13s | $2.34 | |
| 34 | Consistent metrics layercursor · composer-2 | classic-de | 0.99 | 1m 34s | $0.11 | |
| 35 | Csv ingestcursor · composer-2 | base-rt | 0.99 | 2m 3s | $0.10 | |
| 36 | Csv ingestcursor · composer | base-rt | 0.99 | 6m 25s | $0.89 | |
| 37 | Derived composite metricsclaude-code · opus-4-6 | base-rt | 0.99 | 1m 54s | $0.32 | |
| 38 | Derived composite metricsclaude-code · sonnet-4-6 | base-rt | 0.99 | 2m 4s | $0.32 | |
| 39 | Derived composite metricscursor · composer-2 | base-rt | 0.99 | 1m 47s | $0.11 | |
| 40 | Event sourcing replayclaude-code · opus-4-6 | classic-de | 0.99 | 2m 5s | $0.37 | |
| 41 | Metric definition backfillclaude-code · opus-4-6 | classic-de | 0.99 | 2m 21s | $0.48 | |
| 42 | Metric definition backfillclaude-code · sonnet-4-6 | classic-de | 0.99 | 2m 27s | $0.34 | |
| 43 | Multi dimensional query builderclaude-code · opus-4-6 | classic-de | 0.99 | 3m 44s | $0.97 | |
| 44 | Paginated analytics apicursor · composer-2 | classic-de | 0.99 | 2m 1s | $0.07 | |
| 45 | Postgres table partitioningcodex · gpt-5.4 | classic-de | 0.99 | 2m 15s | $0.84 | |
| 46 | Schema evolutionclaude-code · sonnet-4-6 | classic-de | 0.99 | 1m 29s | $0.25 | |
| 47 | Schema evolutioncursor · composer-2 | classic-de | 0.99 | 2m 12s | $0.13 | |
| 48 | Slow queriescursor · composer-2 | base-rt | 0.99 | 3m 19s | $0.34 | |
| 49 | Stream to olapcursor · composer | classic-de | 0.99 | 1m 3s | $3.41 | |
| 50 | Table layoutclaude-code · sonnet-4-6 | base-rt | 0.99 | 2m 25s | $0.26 | |
| 51 | Table layoutcodex · gpt-5.4 | base-rt | 0.99 | 1m 57s | $1.71 | |
| 52 | Table layoutcursor · composer-2 | base-rt | 0.99 | 1m 51s | $0.12 | |
| 53 | Time grain rollupsclaude-code · sonnet-4-6 | classic-de | 0.99 | 3m 32s | $0.21 | |
| 54 | Time grain rollupscodex · gpt-5.4 | classic-de | 0.99 | 4m 43s | $6.71 | |
| 55 | Transform chaincursor · composer-2 | classic-de | 0.99 | 2m 53s | $0.15 | |
| 56 | Broken connectioncodex · gpt-5.4 | base-rt | 0.97 | 3m 12s | $1.70 | |
| 57 | Clickhouse historical backfillclaude-code · sonnet-4-6 | classic-de | 0.97 | 26s | $0.12 | |
| 58 | Clickhouse historical backfillcursor · composer-2 | classic-de | 0.97 | 1m 41s | $0.18 | |
| 59 | Clickhouse live schema migrationcursor · composer-2 | classic-de | 0.97 | 1m 30s | $0.12 | |
| 60 | Clickhouse orderby optimizationclaude-code · opus-4-6 | base-rt | 0.97 | 1m 33s | $0.32 | |
| 61 | Clickhouse orderby optimizationclaude-code · sonnet-4-6 | base-rt | 0.97 | 1m 3s | $0.19 | |
| 62 | Clickhouse orderby optimizationcursor · composer-2 | base-rt | 0.97 | 1m 34s | $0.14 | |
| 63 | Clickhouse ttl lifecycleclaude-code · opus-4-6 | base-rt | 0.97 | 36s | $0.20 | |
| 64 | Clickhouse ttl lifecycleclaude-code · sonnet-4-6 | base-rt | 0.97 | 1m 12s | $0.24 | |
| 65 | Clickhouse ttl lifecyclecursor · composer-2 | base-rt | 0.97 | 55s | $0.06 | |
| 66 | Derived composite metricscodex · gpt-5.4 | base-rt | 0.97 | 2m 34s | $7.02 | |
| 67 | Multi dimensional query builderclaude-code · sonnet-4-6 | classic-de | 0.97 | 1m 48s | $0.32 | |
| 68 | Postgres index tuningcursor · composer | base-rt | 0.97 | 14m 50s | $4.56 | |
| 69 | Postgres readiness raceclaude-code · opus-4-6 | base-rt | 0.97 | 46s | $0.20 | |
| 70 | Postgres readiness racecodex · gpt-5.4 | base-rt | 0.97 | 2m 2s | $2.88 | |
| 71 | Postgres readiness racecursor · composer-2 | base-rt | 0.97 | 55s | $0.06 | |
| 72 | Postgres table partitioningclaude-code · opus-4-6 | classic-de | 0.97 | 1m 33s | $0.25 | |
| 73 | Postgres table partitioningclaude-code · sonnet-4-6 | classic-de | 0.97 | 1m 43s | $0.24 | |
| 74 | Postgres table partitioningcursor · composer-2 | classic-de | 0.97 | 1m 33s | $0.06 | |
| 75 | Realtime streaming metricsclaude-code · sonnet-4-6 | classic-de | 0.97 | 4m 16s | $0.60 | |
| 76 | Slow analytical apiclaude-code · opus-4-6 | base-rt | 0.97 | 2m 22s | $0.43 | |
| 77 | Slow analytical apiclaude-code · sonnet-4-6 | base-rt | 0.97 | 3m 12s | $0.53 | |
| 78 | Slow analytical apicodex · gpt-5.4 | base-rt | 0.97 | 2m 44s | $5.49 | |
| 79 | Slow queriesclaude-code · opus-4-6 | base-rt | 0.97 | 1m 10s | $0.25 | |
| 80 | Slow queriesclaude-code · sonnet-4-6 | base-rt | 0.97 | 1m 19s | $0.17 | |
| 81 | Slow queriescodex · gpt-5.4 | base-rt | 0.97 | 3m 46s | $2.08 | |
| 82 | Stream to olapclaude-code · sonnet-4-6 | classic-de | 0.97 | 2m 3s | $0.30 | |
| 83 | Stream to olapcursor · composer-2 | classic-de | 0.97 | 4m 0s | $0.19 | |
| 84 | Time grain rollupsclaude-code · opus-4-6 | classic-de | 0.97 | 3m 41s | $0.31 | |
| 85 | Time grain rollupscursor · composer-2 | classic-de | 0.97 | 1m 12s | $0.06 | |
| 86 | Time grain rollupscursor · composer | classic-de | 0.97 | 8m 9s | $1.20 | |
| 87 | Analytics api concurrencyclaude-code · opus-4-6 | classic-de | 0.96 | 54s | $0.23 | |
| 88 | Analytics api concurrencyclaude-code · sonnet-4-6 | classic-de | 0.96 | 1m 30s | $0.24 | |
| 89 | Analytics api concurrencycursor · composer-2 | classic-de | 0.96 | 2m 9s | $0.19 | |
| 90 | Cdc postgres to redpandacodex · gpt-5.4 | classic-de | 0.96 | 6m 21s | $8.24 | |
| 91 | Consumer lag fixclaude-code · sonnet-4-6 | base-rt | 0.96 | 2m 26s | $0.54 | |
| 92 | Slow analytical apicursor · composer-2 | base-rt | 0.96 | 2m 12s | $0.20 | |
| 93 | Stream to olapcodex · gpt-5.4 | classic-de | 0.96 | 2m 51s | $2.71 | |
| 94 | Broken connectionclaude-code · sonnet-4-6 | base-rt | 0.95 | 3m 3s | $0.17 | |
| 95 | Ingest to apiclaude-code · opus-4-6 | base-rt | 0.93 | 4m 48s | $1.09 | |
| 96 | Ingest to apiclaude-code · sonnet-4-6 | base-rt | 0.93 | 2m 36s | $0.31 | |
| 97 | Paginated analytics apiclaude-code · opus-4-6 | classic-de | 0.92 | 1m 32s | $0.24 | |
| 98 | Paginated analytics apiclaude-code · sonnet-4-6 | classic-de | 0.91 | 1m 7s | $0.16 | |
| 99 | Csv ingestcodex · gpt-5.4 | base-rt | 0.70 | 4m 4s | $5.74 | |
| 100 | Postgres index tuningclaude-code · opus-4-6 | base-rt | 0.70 | 1m 52s | $0.15 | |
| 101 | Postgres index tuningcodex · gpt-5.4 | base-rt | 0.70 | 3m 37s | $2.46 | |
| 102 | Postgres index tuningcursor · composer-2 | base-rt | 0.70 | 4m 16s | $0.27 | |
| 103 | Ecommerce pipeline recoverycursor · composer-2 | classic-de | 0.53 | 2m 57s | $0.29 | |
| 104 | Quality gateclaude-code · opus-4-6 | classic-de | 0.53 | 2m 45s | $0.49 | |
| 105 | Quality gatecodex · gpt-5.4 | classic-de | 0.53 | 3m 15s | $4.49 | |
| 106 | Quality gatecursor · composer-2 | classic-de | 0.53 | 2m 47s | $0.25 | |
| 107 | Cross system reconciliationclaude-code · opus-4-6 | classic-de | 0.45 | 3m 6s | $0.80 | |
| 108 | Ecommerce pipeline recoveryclaude-code · opus-4-6 | classic-de | 0.30 | 1m 24s | $0.30 | |
| 109 | Ecommerce pipeline recoveryclaude-code · sonnet-4-6 | classic-de | 0.30 | 1m 35s | $0.26 | |
| 110 | Realtime streaming metricscursor · composer-2 | classic-de | 0.30 | 3m 11s | $0.50 | |
| 111 | Consumer lag fixcursor · composer-2 | base-rt | 0.20 | 3m 15s | $0.22 | |
| 112 | Postgres index tuningclaude-code · sonnet-4-6 | base-rt | 0.20 | 2m 11s | $0.07 | |
| 113 | Postgres readiness raceclaude-code · sonnet-4-6 | base-rt | 0.20 | 12s | $0.03 | |
| 114 | Ingest to apicodex · gpt-5.4 | base-rt | 0.16 | 2m 43s | $3.03 | |
| 115 | Cross system reconciliationcodex · gpt-5.4 | classic-de | 0.15 | 3m 44s | $6.51 | |
| 116 | Cross system reconciliationcursor · composer | classic-de | 0.15 | 1m 36s | $2.62 | |
| 117 | Analytics api concurrencycodex · gpt-5.4 | classic-de | 0.10 | 1m 59s | $3.95 | |
| 118 | Broken connectionclaude-code · opus-4-6 | base-rt | 0.10 | 3m 52s | $0.44 | |
| 119 | Csv ingestclaude-code · opus-4-6 | base-rt | 0.10 | 2m 45s | $0.15 | |
| 120 | Csv ingestclaude-code · sonnet-4-6 | base-rt | 0.10 | 2m 42s | $0.10 | |
| 121 | Multi dimensional query buildercodex · gpt-5.4 | classic-de | 0.10 | 2m 31s | $2.84 | |
| 122 | Paginated analytics apicodex · gpt-5.4 | classic-de | 0.10 | 2m 18s | $2.08 | |
| 123 | Table layoutclaude-code · opus-4-6 | base-rt | 0.10 | 1m 7s | $0.25 | |
| 124 | Cross system reconciliationclaude-code · sonnet-4-6 | classic-de | 0.00 | 4m 0s | — | |
| 125 | Stream to olapclaude-code · opus-4-6 | classic-de | 0.00 | 10m 0s | — |
Want to benchmark your agent or collaborate on the preview?