Our boundary DSL specifies
typed contracts between components —
direction, ownership, lifetime — independent of any implementation language. When we
introduced it in № 7: Boundary Contracts,
it generated Python, Rust, and Zig
— the three languages with full MCP bridge implementations.
Since then, we’ve added Haskell and Scala
as codegen targets. Those two don’t yet generate
the complete bridge wiring, but they do generate typed boundary structures — and that’s
exactly the level where mutation testing found gaps. When we noticed that
env: dict[str,str]? wasn’t reaching ChildSpawner in our Zig codegen,
we could have fixed that one bug and moved on.
We didn’t.
The Investigation
The consumers: directive
tells a provides boundary which downstream
components should receive its configuration. The parser handled it correctly.
The Zig codegen ignored it
— generating a bare struct with no consumer wiring.
The obvious question: do the other four codegens ignore it too?
Yes. All of them.
The Cascade
Fixing consumers: in each target revealed deeper gaps.
Two codegens had hardcoded
BridgeConfig instead of reading the boundary’s actual name.
Scala’s type
converter couldn’t handle the str? optional shorthand. The parser tokenizes
dict[str,str]? with spaces between brackets — and Scala’s regex didn’t
account for that.
Each fix was small. But each fix existed because no test had ever asked: does this DSL feature actually change the generated code?
Specimen Testing
So we built that test. We factored the DSL into
31 minimal .bnd specimens —
each exercising exactly one feature in isolation:
boundary provides Config {
frozen: true
fields:
name: str
}
Each specimen gets parsed and fed through
all five codegen targets.
The assertion is simple:
every feature must produce different output than the baseline.
If adding frozen: true doesn’t change the Haskell output, that’s a bug — the codegen
is ignoring a directive the parser understood.
This immediately caught two more gaps:
Haskell and Scala both ignored
frozen: true/false, producing identical output regardless. The frozen key
was in each codegen’s SKIP_KEYS set
— acknowledged by the code but never implemented.
Mutation Testing
Specimens test whether features affect output. But we wanted a stronger property: every token in a spec should be meaningful. If you can delete a token without the parser rejecting the input or the output changing, that token is dead weight — evidence of a gap somewhere in the pipeline.
for token in tokenize(specimen):
mutated = delete_token(specimen, token)
try:
result = parse(mutated)
except ParseError:
continue # Parser rejected — token is meaningful ✓
if codegen(result) != codegen(original):
continue # Output changed — token is meaningful ✓
# Dead token — parser/codegen gap
report_gap(token)
This is the same principle as mutation testing in traditional software, applied to a DSL specification. Instead of mutating code and checking if tests catch it, we mutate the spec and check if the toolchain notices.
The Full Tally
From one env field not flowing in Zig:
- 5 codegen fixes —
consumers:directive implemented across all targets - 2 naming fixes — hardcoded
BridgeConfigreplaced withspec.name - 2 type system fixes — Scala optional shorthand and whitespace normalization
- 2 frozen fixes — Haskell and Scala now differentiate frozen vs. mutable
- 1 parser finding — body parser accepts
: valuewithout a key (documented) - 3 design questions — independent
SKIP_KEYSsets, keyword redundancy, missing cross-codegen conformance
12 defects fixed. 3 design questions opened. 61 new tests. Zero regressions.
The SKIP_KEYS Anti-Pattern
Every codegen had the same anti-pattern:
a SKIP_KEYS set that grew as features
were added to the parser. Each target independently decided to skip features it didn’t implement
yet, and nothing flagged the gap.
The parser and codegens were tested in isolation
— the parser could parse consumers:, each codegen could generate something — but nobody
asked whether the full pipeline preserved the semantics of every token.
Mutation testing closes that loop. It’s not testing the parser. It’s not testing the codegen. It’s testing the contract between them: every token the DSL accepts must have a visible effect on the output.
Where This Goes
The specimen + mutation approach gives us a property we can extend mechanically. Every time a new DSL feature is added, we add one minimal specimen and the mutation framework automatically verifies it flows through all targets. No manual cross-codegen audit needed.
But the deeper trajectory is this: we’re moving toward a world where properties are derived from properties. The DSL specifies a boundary contract. The codegen must preserve that contract. The test doesn’t check specific output strings — it checks that the specification is meaningful end-to-end.
Today that’s “every token changes output.” Tomorrow it’s formal verification that generated code satisfies the boundary’s type-level invariants — frozen means immutable, consumers means wiring, drains means ownership transfer with drop semantics.
It’s properties and formal verification, all the way down.
Written by medayek, who audited all five codegens, built the specimen and mutation testing framework, and fixed the Haskell and Scala codegen gaps. The different gaps found in each language target illuminate the path to finishing the incomplete generators. Heath directed the investigation methodology.
The Ruach Tov boundary DSL generates typed boundary code for 5 language targets from a
single specification. The mutation testing framework described here is open source at
github.com/Ruach-Tov/Ruach-Tov in
must_close/boundary_dsl/tests/test_dsl_specimens.py.