The Velocity Problem
An AI agent can produce 40 tests in a single session. It can refactor 800 lines across 12 files in an hour. It can generate, debug, and ship a new MCP server tool in 20 minutes.
This is wonderful — until you realize that human quality processes assume human velocity. Code review works when a developer pushes 200 lines per day. It breaks when an agent pushes 2000.
The solution isn’t to slow down the agent. It’s to build quality controls that operate at the same velocity as the production.
Three Mechanical Properties
Every quality control method that works at AI velocity shares three properties:
- Mechanical — No human judgment required to determine pass/fail
- Immediate — Results in seconds, not hours or days
- Compositional — Individual checks combine into system-level guarantees
Human code review has none of these. Linting has all three. The question is: what else can we build that has all three?
The Toolkit
1. @fixed_by — Regression Test Verification
Problem: AI agents write regression tests that pass, but you can’t prove they catch the bug they claim to cover.
Solution: The @fixed_by(commit) decorator mechanically verifies that a
test fails on the pre-fix code and passes on the post-fix code. Uses git
worktrees for isolation — no working tree disruption.
@fixed_by("abc123")
def test_negative_balance():
wallet = Wallet()
wallet.withdraw(100)
assert wallet.balance == -100 # Bug: was clipping to 0
Verification: Run pytest --verify-historical to mechanically prove
every @fixed_by test catches its bug.
Velocity match: Verification is automated. Run it in CI. No human needed.
See Post 001: @fixed_by for the full protocol.
2. DDD — Safe Code Removal
Problem: Dead code analysis tools flag 50 items. Some are truly dead, some are called dynamically. An agent can’t just delete everything.
Solution: Demolition-Driven Development (DDD) — the dual of TDD. Write passing tests for dead code, mark xfail, remove code, verify xfail. The annotations become permanent sentinels against accidental re-introduction.
Velocity match: The agent writes tests at AI speed. Each removal is individually verifiable. XPASS alerts are immediate.
See Post № 3: DDD for the full methodology.
3. Property-Based Testing — Semantic Invariants
Problem: Unit tests check specific cases. An agent producing code at velocity needs to know that invariants hold, not just that 5 examples work.
Solution: Declare properties that must hold for all inputs. Let Hypothesis generate thousands of edge cases automatically.
@given(st.integers(), st.integers())
def test_add_commutative(a, b):
assert add(a, b) == add(b, a)
@given(st.integers())
def test_balance_never_negative_after_deposit(amount):
assume(amount > 0)
wallet = Wallet()
wallet.deposit(amount)
assert wallet.balance >= 0
Velocity match: One property test replaces dozens of example tests. Hypothesis finds edge cases the agent wouldn’t think to test.
4. Nilpotent Migration Tests — Schema Safety
Problem: AI agents modify database schemas. How do you know a migration is safe to re-run? How do you know it doesn’t destroy data?
Solution: Test that every migration is nilpotent (running it twice produces the same result as running it once):
def test_migration_nilpotent(fresh_db):
migrate(fresh_db) # First run
state_after_first = snapshot(fresh_db)
migrate(fresh_db) # Second run
state_after_second = snapshot(fresh_db)
assert state_after_first == state_after_second
Velocity match: Mechanical. Immediate. Catches the entire class of “migration breaks on re-run” bugs that plague rapid development.
5. Portal Packaging — Minimal API Surface
Problem: AI agents extract functionality into packages. How do you prevent the package from becoming a kitchen-sink dependency?
Solution: Portal extraction — a specialized API slice with constant propagation. Instead of exposing a general-purpose library, extract the narrowest possible interface with all internal constants inlined.
Example: pytest-fixed-by exposes exactly 3 git operations with hardcoded
flags (--detach, --force, capture_output=True). Users can’t
misconfigure them because there’s nothing to configure.
Velocity match: The narrower the portal, the easier the adapter. Three functions with primitive-type signatures are trivially adoptable.
The Stack
These methods compose. For a single bug fix, an agent can:
- Fix the bug
- Write a
@fixed_bytest (proves the test catches the bug) - Run property tests (proves invariants still hold)
- Clean up dead code with DDD (proves removal was safe)
- Ship as a portal package (proves API is minimal)
Each step is mechanical, immediate, and compositional. No human in the loop for verification — only for direction.
What Doesn’t Work at Velocity
Some traditional practices don’t scale. This isn’t a criticism — they were designed for human velocity and they work well there:
| Practice | Why It Doesn’t Scale |
|---|---|
| Code review | Requires human attention duration; the time required for human review of source code would rate-govern most Agentic development methodologies. |
| Manual QA | Linear scaling, can’t match exponential output |
| Documentation-first | Agent produces faster than docs can be reviewed. Old docs are tedious to review, and overtake context window size, reducing coherence. |
| Gradual refactoring | Agents are often better-suited to making one sweeping refactoring, than composing several incremental refactorings. |
The replacement isn’t “skip these.” It’s “use mathematical and mechanical verification methodologies that provide better guarantees, faster.”
| Practice | How We Adapt At Ruach Tov |
|---|---|
| Code review | Express intent as concise declarative specifications — annotations like @fixed_by(commit), boundary contracts, structured test declarations — rather than reviewing the generated source code directly. Humans review the specifications; mechanical verification handles the rest. |
| Manual QA | Declarative specifications produce testable invariants. Cross-product composition (“these properties × that state machine”) generates exhaustive test suites mechanically. Property-based testing, Hypothesis, and stateful properties replace manual verification. |
| Documentation-first | For upstream dependencies, curated knowledge documents (like Context Hub entries) map API behavior to structured, declarative descriptions. For our own projects, documentation derives from the specifications rather than the other way around — the spec is the source of truth. |
| Gradual refactoring | Extensive coverage of test methodologies make sweeping refactorings safe. |
The Meta-Pattern
All of these methods share a deeper pattern: make the computer prove what a human would have to review.
- @fixed_by: “Does this test catch this bug?” → git worktree proof
- DDD: “Is this code safe to remove?” → xfail proof
- Property tests: “Does this invariant hold?” → Hypothesis proof
- Nilpotent tests: “Is this migration safe?” → re-run proof
- Portal packaging: “Is this API minimal?” → constant propagation proof
When an AI agent produces code at 10-100x velocity, the only way to maintain quality is to produce proof at the same velocity. These methods do that.
These methods were developed and battle-tested in the Ruach Tov project, where AI agents maintain a system of 15+ MCP servers, 1000+ tests, and 50,000+ lines of code across Python, TypeScript, and Rust.