Quality Control at AI Velocity

The Velocity Problem

An AI agent can produce 40 tests in a single session. It can refactor 800 lines across 12 files in an hour. It can generate, debug, and ship a new MCP server tool in 20 minutes.

This is wonderful — until you realize that human quality processes assume human velocity. Code review works when a developer pushes 200 lines per day. It breaks when an agent pushes 2000.

The solution isn’t to slow down the agent. It’s to build quality controls that operate at the same velocity as the production.

Three Mechanical Properties

Every quality control method that works at AI velocity shares three properties:

Mechanical — No human judgment required to determine pass/fail
Immediate — Results in seconds, not hours or days
Compositional — Individual checks combine into system-level guarantees

Human code review has none of these. Linting has all three. The question is: what else can we build that has all three?

The Toolkit

1. @fixed_by — Regression Test Verification

Problem: AI agents write regression tests that pass, but you can’t prove they catch the bug they claim to cover.

Solution: The @fixed_by(commit) decorator mechanically verifies that a test fails on the pre-fix code and passes on the post-fix code. Uses git worktrees for isolation — no working tree disruption.

@fixed_by("abc123")
def test_negative_balance():
    wallet = Wallet()
    wallet.withdraw(100)
    assert wallet.balance == -100  # Bug: was clipping to 0

Verification: Run pytest --verify-historical to mechanically prove every @fixed_by test catches its bug.

Velocity match: Verification is automated. Run it in CI. No human needed.

See Post 001: @fixed_by for the full protocol.

2. DDD — Safe Code Removal

Problem: Dead code analysis tools flag 50 items. Some are truly dead, some are called dynamically. An agent can’t just delete everything.

Solution: Demolition-Driven Development (DDD) — the dual of TDD. Write passing tests for dead code, mark xfail, remove code, verify xfail. The annotations become permanent sentinels against accidental re-introduction.

Velocity match: The agent writes tests at AI speed. Each removal is individually verifiable. XPASS alerts are immediate.

See Post № 3: DDD for the full methodology.

3. Property-Based Testing — Semantic Invariants

Problem: Unit tests check specific cases. An agent producing code at velocity needs to know that invariants hold, not just that 5 examples work.

Solution: Declare properties that must hold for all inputs. Let Hypothesis generate thousands of edge cases automatically.

@given(st.integers(), st.integers())
def test_add_commutative(a, b):
    assert add(a, b) == add(b, a)

@given(st.integers())
def test_balance_never_negative_after_deposit(amount):
    assume(amount > 0)
    wallet = Wallet()
    wallet.deposit(amount)
    assert wallet.balance >= 0

Velocity match: One property test replaces dozens of example tests. Hypothesis finds edge cases the agent wouldn’t think to test.

4. Nilpotent Migration Tests — Schema Safety

Problem: AI agents modify database schemas. How do you know a migration is safe to re-run? How do you know it doesn’t destroy data?

Solution: Test that every migration is nilpotent (running it twice produces the same result as running it once):

def test_migration_nilpotent(fresh_db):
    migrate(fresh_db)  # First run
    state_after_first = snapshot(fresh_db)
    migrate(fresh_db)  # Second run
    state_after_second = snapshot(fresh_db)
    assert state_after_first == state_after_second

Velocity match: Mechanical. Immediate. Catches the entire class of “migration breaks on re-run” bugs that plague rapid development.

5. Portal Packaging — Minimal API Surface

Problem: AI agents extract functionality into packages. How do you prevent the package from becoming a kitchen-sink dependency?

Solution: Portal extraction — a specialized API slice with constant propagation. Instead of exposing a general-purpose library, extract the narrowest possible interface with all internal constants inlined.

Example: pytest-fixed-by exposes exactly 3 git operations with hardcoded flags (--detach, --force, capture_output=True). Users can’t misconfigure them because there’s nothing to configure.

Velocity match: The narrower the portal, the easier the adapter. Three functions with primitive-type signatures are trivially adoptable.

The Stack

These methods compose. For a single bug fix, an agent can:

Fix the bug
Write a @fixed_by test (proves the test catches the bug)
Run property tests (proves invariants still hold)
Clean up dead code with DDD (proves removal was safe)
Ship as a portal package (proves API is minimal)

Each step is mechanical, immediate, and compositional. No human in the loop for verification — only for direction.

What Doesn’t Work at Velocity

Some traditional practices don’t scale. This isn’t a criticism — they were designed for human velocity and they work well there:

Practice	Why It Doesn’t Scale
Code review	Requires human attention duration; the time required for human review of source code would rate-govern most Agentic development methodologies.
Manual QA	Linear scaling, can’t match exponential output
Documentation-first	Agent produces faster than docs can be reviewed. Old docs are tedious to review, and overtake context window size, reducing coherence.
Gradual refactoring	Agents are often better-suited to making one sweeping refactoring, than composing several incremental refactorings.

The replacement isn’t “skip these.” It’s “use mathematical and mechanical verification methodologies that provide better guarantees, faster.”

Practice	How We Adapt At Ruach Tov
Code review	Express intent as concise declarative specifications — annotations like `@fixed_by(commit)`, boundary contracts, structured test declarations — rather than reviewing the generated source code directly. Humans review the specifications; mechanical verification handles the rest.
Manual QA	Declarative specifications produce testable invariants. Cross-product composition (“these properties × that state machine”) generates exhaustive test suites mechanically. Property-based testing, Hypothesis, and stateful properties replace manual verification.
Documentation-first	For upstream dependencies, curated knowledge documents (like Context Hub entries) map API behavior to structured, declarative descriptions. For our own projects, documentation derives from the specifications rather than the other way around — the spec is the source of truth.
Gradual refactoring	Extensive coverage of test methodologies make sweeping refactorings safe.

The Meta-Pattern

All of these methods share a deeper pattern: make the computer prove what a human would have to review.

@fixed_by: “Does this test catch this bug?” → git worktree proof
DDD: “Is this code safe to remove?” → xfail proof
Property tests: “Does this invariant hold?” → Hypothesis proof
Nilpotent tests: “Is this migration safe?” → re-run proof
Portal packaging: “Is this API minimal?” → constant propagation proof

When an AI agent produces code at 10-100x velocity, the only way to maintain quality is to produce proof at the same velocity. These methods do that.

These methods were developed and battle-tested in the Ruach Tov project, where AI agents maintain a system of 15+ MCP servers, 1000+ tests, and 50,000+ lines of code across Python, TypeScript, and Rust.

№ 2: Quality Control at AI Velocity