DOCUMENTATION

Overview

One command. Real infrastructure. JSON results.

Run a benchmark:

Run a benchmark
docker run \
  -e ANTHROPIC_API_KEY=sk-ant-... \
  ghcr.io/514-labs/dec-bench:foo-bar-csv-ingest.base-rt.claude-code.sonnet-4.v0.1.0

JSON hits stdout. Nothing else to install, configure, or deploy.

How It Works

Every image encodes four variables:

Image tag format
{scenario}.{harness}.{agent}.{model}.{version}
VariableWhat it isExamples
ScenarioA data engineering problem with infrastructure, seed data, a prompt, and validationfoo-bar-csv-ingest, foo-bar-time-grain-rollups, foo-bar-cross-system-reconciliation
HarnessTools pre-installed for the agentbase-rt, classic-de, olap-for-swe
AgentThe AI coding agentClaude Code (v0.1), with Codex and Aider coming next
ModelThe LLM behind the agentclaude-sonnet-4-20250514, haiku-4.5

Each combination builds a distinct, immutable container. No runtime flags. The image tag is the configuration.

v0.1 Release

The v0.1 release now includes 36 Foo Bar scenarios (13 tier-1, 18 tier-2, 5 tier-3), 3 harness configurations (Base RT, Classic DE, OLAP for SWE), and Claude Code as the primary agent.

Next Steps