DOCUMENTATION

Foo Bar (Dummy)

Synthetic domain for the v0.1 release. Thirty-six scenarios spanning ingestion, schema design, query optimization, reliability, observability, streaming, and cross-system reconciliation.

The Foo Bar domain is the foundation of the v0.1 release. It uses synthetic data with no production business semantics, letting the benchmark focus purely on data engineering competency without domain-specific knowledge requirements.

v0.1 Scenarios

Thirty-six scenarios ship in v0.1:

Tier 1 (13): focused debugging and optimization across ClickHouse, Postgres, and Redpanda.
Tier 2 (18): multi-step transformations, semantic modeling, API hardening, and streaming integration.
Tier 3 (5): end-to-end failure recovery, replay, reconciliation, and migration tradeoff scenarios.

Representative scenarios:

Scenario	Competency	Services	Starting State
CSV Ingest	Data Ingestion and Integration	ClickHouse	Greenfield
DLQ Setup	Reliability and Fault Tolerance	Redpanda + ClickHouse	Broken
Time-Grain Rollups	Transformation and Semantic Modeling	ClickHouse	Broken
CDC Postgres -> Redpanda	Data Ingestion and Integration	Postgres + Redpanda	Greenfield
Cross-System Reconciliation	Data Quality and Observability	Postgres + Redpanda + ClickHouse	Broken
Event Sourcing Replay	Distributed Systems and Consistency	Postgres + Redpanda	Broken
OLTP-to-OLAP Migration	Technology Selection and Architecture Tradeoffs	Postgres + ClickHouse	Broken

Feature Coverage

The expanded Foo Bar set now includes scenarios mapped to all feature areas used in the registry:

performance-dashboards
reporting-metrics-layers
exported-reports
realtime-feeds
analytical-chat

Core Data

Synthetic dimension tables with deterministic row counts.
Generated event streams with configurable cardinality.
Placeholder metric columns (integers, floats, timestamps).
Simple foreign-key relationships for join validation.

Characteristic Challenges

Data carries no business meaning -- assertions rely on structural checks rather than domain logic.
Useful for verifying pipeline plumbing, schema propagation, and idempotency without domain expertise.
Seed data ranges from small (5 rows) to large (10M rows) depending on the scenario.

Why Foo Bar First

Starting with synthetic data eliminates two variables: domain complexity and data contamination risk. Every scenario in the Foo Bar domain is designed for deterministic scoring, making it straightforward to validate the eval infrastructure and scoring framework while still exercising realistic day-to-day data engineering workflows.