Documentation

Harnesses

Each harness defines the tooling environment a scenario runs against.

Each scenario runs inside a Docker container with pre-installed tools. The harness controls which tools are available to the agent, so you can compare how agents perform with different toolkits against the same problem. v0.1 ships with three.

LLM+Agent harness=Standard agent

A model (Claude, GPT, Gemini) paired with an agent CLI (Claude Code, Cursor, Codex) that gives it bash, file I/O, and tool use. Standard agents can be extended with skills, MCPs, and hooks.

Standard agent+Harness plugins=Specialized agent

Skills, MCPs, and pre-installed tools (dbt, Airflow, MooseStack) that turn a general-purpose agent into a specialized one like data engineering.

Base RT

The control group.

No extra tooling beyond the base infrastructure. The agent has Python, Node.js, and CLI access to the three foundational services — ClickHouse (warehouse), Redpanda (streaming), and Postgres (transactional). This is the control group.

Infrastructure

Base infrastructure only: ClickHouse, Redpanda, Postgres, Python 3, Node.js 22

Install script

>_ harnesses/base-rt.sh
# Uses base image defaults only
Network: open|48 scenarios

Classic Data Engineering

The standard toolkit.

The standard data engineering toolkit. Airflow for orchestration, Spark for distributed processing, and dbt for warehouse transformations. The agent wires together independent best-in-class tools on top of the base infrastructure.

Installed tools

  • Apache Airflow 2.10.5
  • PySpark 3.5.5
  • dbt-core 1.10.19
  • dbt-postgres 1.10.0
  • dbt-clickhouse 1.10.0
  • OpenJDK 17

Install script

>_ harnesses/classic-de.sh
apt-get update &&
apt-get install -y --no-install-recommends openjdk-17-jre-headless &&
rm -rf /var/lib/apt/lists/* &&
pip3 install --no-cache-dir --break-system-packages dbt-core==1.10.19 dbt-postgres==1.10.0 dbt-clickhouse==1.10.0 apache-airflow==2.10.5 pyspark==3.5.5
Network: open|48 scenarios

OLAP for Software Engineers

The unified framework.

A software-engineering-first approach to data engineering on top of the base infrastructure. MooseStack provides typed schemas, automated migrations, built-in MCP server, and hot-reloading dev environment. The agent works within a unified framework instead of wiring independent tools.

Installed tools

  • moose-cli 0.6.424
  • moose-lib 0.6.424

Install script

>_ harnesses/olap-for-swe.sh
npm install -g @514labs/moose-cli@0.6.424 &&
mkdir -p /opt/dec-bench/moose &&
cd /opt/dec-bench/moose &&
npm init -y &&
npm install @514labs/moose-lib@0.6.424
Network: open|48 scenarios

At a Glance

Base RTClassic Data EngineeringOLAP for Software Engineers
OrchestrationApache Airflow 2.10.5
Transformationdbt-core 1.10.19, dbt-postgres 1.10.0, dbt-clickhouse 1.10.0
ProcessingPySpark 3.5.5
Frameworkmoose-cli 0.6.424, moose-lib 0.6.424
RuntimeOpenJDK 17
Networkopenopenopen
Scenarios484848

Contribute a harness

Write a shell script, pin versions, and open a PR.

Getting started guide