Documentation
Writing Gate Assertions
Understand core and scenario-specific assertions, what belongs in each gate, and patterns from existing scenarios.
Core vs scenario assertions
Every scenario is evaluated by two layers of assertions:
Core assertions are built into the framework and run for every scenario automatically. You do not write these.
| Gate | Core assertions |
|---|---|
| Functional | process_exits_clean (exit code 0), no_unhandled_errors (no tracebacks or panics in session log) |
| Robust | idempotent_rerun (scenario can be rerun without errors) |
| Production | uses_env_vars, no_secrets_in_code, no_dead_code_markers, no_debug_artifacts, zero_compiler_errors, zero_lint_errors, has_type_safety, functions_are_focused, no_deep_nesting, and others |
The Correct and Performant gates have no core assertions. These are entirely scenario-specific.
Scenario assertions are what you write in assertions/. They test whether the agent solved your specific problem. Each gate file exports named async functions that the framework discovers and runs.
Guidelines
These apply across all gates:
- One check per function. A function named
table_has_rowsshould not also verify schema. - Make failure messages actionable. "Expected 15 rows, got 12" is useful. "Assertion failed" is not.
- Avoid side effects. Assertions read state, they do not modify it. The robust gate's idempotency check is the exception.
- Keep assertions fast. Long-running assertions slow down the feedback loop.
- Use the shared helpers in
_shared/assertion-helpers.tsfor common checks like hardcoded connections andSELECT *scanning. - Browse existing scenario assertions for more patterns.
Gate 1: Functional
Check that the agent produced the expected artifacts. These are typically existence checks.
- Target table exists in the database
- Expected files are present
- Basic row count is non-zero
export async function target_table_exists(ctx: AssertionContext): Promise<AssertionResult> {
// SELECT count() FROM system.tables WHERE database = 'analytics' AND name = 'events'
}View full assertion file from the foo-bar-csv-ingest scenario
Gate 2: Correct
Validate that the data is accurate. These checks use exact counts, checksums, or value comparisons.
- Exact row counts match expected totals
- Checksums match between source and target
- No null values in required fields
- Date ranges are valid
- Cross-system counts agree
export async function all_fifteen_events_loaded(ctx: AssertionContext): Promise<AssertionResult> {
// SELECT count() FROM analytics.events -- expects exactly 15
}export async function revenue_checksum(ctx: AssertionContext): Promise<AssertionResult> {
// Compare SUM(amount) between Postgres source and ClickHouse target
// Math.abs(sourceSum - targetSum) < 0.001
}Example assertions: foo-bar-csv-ingest scenario | ecommerce-pipeline-recovery scenario
Gate 3: Robust
Test that the solution is resilient. These checks look for common data problems and structural quality.
- No duplicate header rows in loaded data
- No data loss under volume
- Schema has primary key constraints
- Required columns are present
Many scenarios also use shared helpers from scenarios/_shared/assertion-helpers.ts:
scanWorkspaceForHardcodedConnections(): scans agent-written code for hardcoded connection stringsavoidsSelectStarQueries(): scans forSELECT *patterns
import { scanWorkspaceForHardcodedConnections } from "../../_shared/assertion-helpers";
export async function no_hardcoded_connection_strings(): Promise<AssertionResult> {
return scanWorkspaceForHardcodedConnections();
}Example assertions: foo-bar-csv-ingest scenario
See also: shared assertion helpers
Gate 4: Performant
Measure query latency and pipeline throughput against thresholds. Thresholds vary by scenario.
- Query execution under 100-200ms
- No
SELECT *in production queries
export async function scan_query_under_100ms(ctx: AssertionContext): Promise<AssertionResult> {
// Time the query execution, check elapsed < 100
}Example assertions: foo-bar-slow-queries scenario
Gate 5: Production
Check production-readiness beyond correctness. Most of these are handled by core assertions, but scenarios can add their own.
- Connection environment variables are available
- No temporary or test tables left behind
- Documentation exists
- Data grain is enforced (e.g. daily grain)
export async function no_temporary_tables(ctx: AssertionContext): Promise<AssertionResult> {
// COUNT(*) WHERE table LIKE '%tmp%' = 0
}Example assertions: ecommerce-pipeline-recovery scenario