TDD Has a Dual
Test-Driven Development has a well-defined protocol for adding code:
- Write a failing test
- Write code until it passes
- Refactor
This is universally understood. But what about removing code? Most teams treat code removal as informal: grep for usages, check nothing breaks, push. There’s no protocol that proves the removal was correct.
Demolition-Driven Development (DDD) fills that gap. It is the exact dual of TDD:
| TDD (Adding) | DDD (Removing) | |
|---|---|---|
| Start | Write a failing test | Write a passing test |
| Act | Code until it passes | Mark xfail, remove code |
| Verify | Test passes | Test xfails |
| Sentinel | Passing test catches regressions | xfail catches accidental re-introduction |
The Protocol
Step 1: Write Tests That Pass
For each piece of code you want to remove, write tests that exercise it and pass. This sounds backwards — why test code you’re about to delete?
Because the test proves the code exists and works right now. If you can’t write a passing test, you’ve already learned something: the code may be unreachable, which is even stronger evidence it should go.
# This test passes — confirming the vestigial method works today
def test_legacy_sync_method():
db = Database()
result = db.sync_all_records() # vestigial method from v1
assert result["synced"] == 0
assert result["status"] == "ok"
Step 2: Annotate with xfail
Mark the test with @pytest.mark.xfail and a reason declaring intent:
@pytest.mark.xfail(reason="removing vestigial sync_all_records per Phase 0")
def test_legacy_sync_method():
db = Database()
result = db.sync_all_records()
assert result["synced"] == 0
assert result["status"] == "ok"
Step 3: Remove the Code
Delete the method, class, module, or whatever you’re removing.
Step 4: Verify
Run pytest. The xfail test should now report XFAIL (expected failure). All other tests should still pass.
If instead you see XPASS (unexpected pass), something else is providing that behavior. You’ve misidentified the removal — investigate before proceeding.
Why This Matters for AI Agents
AI agents removing code face a unique governance problem: who verifies the removal was correct, and how?
With DDD: - The xfail annotations are the governance proof - Before-state: test was written, confirmed passing (git history shows this) - After-state: test is xfail, confirming code is gone - The diff between these two commits is a complete, auditable removal record
This is especially critical in large codebases where dead code analysis tools (like Vulture) report dozens of potentially unused items. An agent can’t just delete everything Vulture flags — some methods are called dynamically, some are part of public APIs. DDD gives each removal a verifiable audit trail.
The Sentinel Property
The xfail annotation is permanent. It stays in your test suite forever. This gives you a crucial safety property: if someone (human or AI) accidentally re-introduces the removed behavior, pytest immediately flags it as XPASS.
XPASS test_legacy_sync_method — removing vestigial sync_all_records per Phase 0
This is the dual of TDD’s regression detection. TDD catches broken additions. DDD catches broken removals.
Unifying with Coverage Work
Here’s the elegant part: writing tests for untested code is the first half of the demolition protocol.
When you audit a codebase and find untested methods, you face a decision for each one:
- Keep it? → The test becomes a migration safety net (it stays, no xfail)
- Remove it? → The test becomes a demolition proof (mark xfail, proceed)
Either way, you wrote the test. The same work serves both purposes. The decision about what to keep vs. remove is a separate, auditable step.
Worked Example
We used DDD to clean up the Ruach Memory server. Vulture identified 14 dead database methods and 9 dead model methods. For each:
# Phase 0: Write passing tests for dead methods
class TestVestigialMethods:
"""Tests for methods being removed in the Phase 0 cleanup."""
@pytest.mark.xfail(reason="removing Database.get_all_memories (dead since v2)")
def test_get_all_memories(self, db):
result = db.get_all_memories()
assert isinstance(result, list)
@pytest.mark.xfail(reason="removing Database.compact_by_topic (dead since v2)")
def test_compact_by_topic(self, db):
result = db.compact_by_topic("test-topic")
assert result["compacted"] >= 0
After removal, every one of these reported XFAIL. The 15 live untested methods got tests without xfail — they became the safety net for the rest of the refactoring.
The Dual Table
| TDD Property | DDD Equivalent |
|---|---|
| Red → Green → Refactor | Green → xfail → Remove |
| Failing test = spec not yet met | Passing test = code exists |
| Passing test = spec satisfied | xfail = code removed |
| Regression = test goes red | Re-introduction = test goes XPASS |
| Test is living documentation | xfail is living demolition record |
Install Nothing
DDD uses only pytest.mark.xfail — a built-in pytest feature. No plugins,
no dependencies, no configuration. The protocol is pure methodology.
This is the lowest possible adoption barrier. If your project uses pytest, you can start using DDD today.
DDD was developed during the Ruach Tov project’s Phase 0 cleanup. The full
methodology and its relationship to @fixed_by verification are documented
at aiit.dev.