Foo Bar (Dummy)
Synthetic domain for the v0.1 release. Thirty-six scenarios spanning ingestion, schema design, query optimization, reliability, observability, streaming, and cross-system reconciliation.
The Foo Bar domain is the foundation of the v0.1 release. It uses synthetic data with no production business semantics, letting the benchmark focus purely on data engineering competency without domain-specific knowledge requirements.
v0.1 Scenarios
Thirty-six scenarios ship in v0.1:
- Tier 1 (13): focused debugging and optimization across ClickHouse, Postgres, and Redpanda.
- Tier 2 (18): multi-step transformations, semantic modeling, API hardening, and streaming integration.
- Tier 3 (5): end-to-end failure recovery, replay, reconciliation, and migration tradeoff scenarios.
Representative scenarios:
| Scenario | Competency | Services | Starting State |
|---|---|---|---|
| CSV Ingest | Data Ingestion and Integration | ClickHouse | Greenfield |
| DLQ Setup | Reliability and Fault Tolerance | Redpanda + ClickHouse | Broken |
| Time-Grain Rollups | Transformation and Semantic Modeling | ClickHouse | Broken |
| CDC Postgres -> Redpanda | Data Ingestion and Integration | Postgres + Redpanda | Greenfield |
| Cross-System Reconciliation | Data Quality and Observability | Postgres + Redpanda + ClickHouse | Broken |
| Event Sourcing Replay | Distributed Systems and Consistency | Postgres + Redpanda | Broken |
| OLTP-to-OLAP Migration | Technology Selection and Architecture Tradeoffs | Postgres + ClickHouse | Broken |
Feature Coverage
The expanded Foo Bar set now includes scenarios mapped to all feature areas used in the registry:
performance-dashboardsreporting-metrics-layersexported-reportsrealtime-feedsanalytical-chat
Core Data
- Synthetic dimension tables with deterministic row counts.
- Generated event streams with configurable cardinality.
- Placeholder metric columns (integers, floats, timestamps).
- Simple foreign-key relationships for join validation.
Characteristic Challenges
- Data carries no business meaning -- assertions rely on structural checks rather than domain logic.
- Useful for verifying pipeline plumbing, schema propagation, and idempotency without domain expertise.
- Seed data ranges from small (5 rows) to large (10M rows) depending on the scenario.
Why Foo Bar First
Starting with synthetic data eliminates two variables: domain complexity and data contamination risk. Every scenario in the Foo Bar domain is designed for deterministic scoring, making it straightforward to validate the eval infrastructure and scoring framework while still exercising realistic day-to-day data engineering workflows.