№ 3: Demolition-Driven Development — The Dual of TDD

TDD tells you how to add features safely. DDD tells you how to remove them safely. Write passing tests for dead code, mark them xfail, delete the code, verify the xfail. The annotations become governance proof.

TDD Has a Dual

Test-Driven Development has a well-defined protocol for adding code:

  1. Write a failing test
  2. Write code until it passes
  3. Refactor

This is universally understood. But what about removing code? Most teams treat code removal as informal: grep for usages, check nothing breaks, push. There’s no protocol that proves the removal was correct.

Demolition-Driven Development (DDD) fills that gap. It is the exact dual of TDD:

TDD (Adding) DDD (Removing)
Start Write a failing test Write a passing test
Act Code until it passes Mark xfail, remove code
Verify Test passes Test xfails
Sentinel Passing test catches regressions xfail catches accidental re-introduction

The Protocol

Step 1: Write Tests That Pass

For each piece of code you want to remove, write tests that exercise it and pass. This sounds backwards — why test code you’re about to delete?

Because the test proves the code exists and works right now. If you can’t write a passing test, you’ve already learned something: the code may be unreachable, which is even stronger evidence it should go.

# This test passes — confirming the vestigial method works today
def test_legacy_sync_method():
    db = Database()
    result = db.sync_all_records()  # vestigial method from v1
    assert result["synced"] == 0
    assert result["status"] == "ok"

Step 2: Annotate with xfail

Mark the test with @pytest.mark.xfail and a reason declaring intent:

@pytest.mark.xfail(reason="removing vestigial sync_all_records per Phase 0")
def test_legacy_sync_method():
    db = Database()
    result = db.sync_all_records()
    assert result["synced"] == 0
    assert result["status"] == "ok"

Step 3: Remove the Code

Delete the method, class, module, or whatever you’re removing.

Step 4: Verify

Run pytest. The xfail test should now report XFAIL (expected failure). All other tests should still pass.

If instead you see XPASS (unexpected pass), something else is providing that behavior. You’ve misidentified the removal — investigate before proceeding.

Why This Matters for AI Agents

AI agents removing code face a unique governance problem: who verifies the removal was correct, and how?

With DDD: - The xfail annotations are the governance proof - Before-state: test was written, confirmed passing (git history shows this) - After-state: test is xfail, confirming code is gone - The diff between these two commits is a complete, auditable removal record

This is especially critical in large codebases where dead code analysis tools (like Vulture) report dozens of potentially unused items. An agent can’t just delete everything Vulture flags — some methods are called dynamically, some are part of public APIs. DDD gives each removal a verifiable audit trail.

The Sentinel Property

The xfail annotation is permanent. It stays in your test suite forever. This gives you a crucial safety property: if someone (human or AI) accidentally re-introduces the removed behavior, pytest immediately flags it as XPASS.

XPASS test_legacy_sync_method — removing vestigial sync_all_records per Phase 0

This is the dual of TDD’s regression detection. TDD catches broken additions. DDD catches broken removals.

Unifying with Coverage Work

Here’s the elegant part: writing tests for untested code is the first half of the demolition protocol.

When you audit a codebase and find untested methods, you face a decision for each one:

  • Keep it? → The test becomes a migration safety net (it stays, no xfail)
  • Remove it? → The test becomes a demolition proof (mark xfail, proceed)

Either way, you wrote the test. The same work serves both purposes. The decision about what to keep vs. remove is a separate, auditable step.

Worked Example

We used DDD to clean up the Ruach Memory server. Vulture identified 14 dead database methods and 9 dead model methods. For each:

# Phase 0: Write passing tests for dead methods
class TestVestigialMethods:
    """Tests for methods being removed in the Phase 0 cleanup."""

    @pytest.mark.xfail(reason="removing Database.get_all_memories (dead since v2)")
    def test_get_all_memories(self, db):
        result = db.get_all_memories()
        assert isinstance(result, list)

    @pytest.mark.xfail(reason="removing Database.compact_by_topic (dead since v2)")
    def test_compact_by_topic(self, db):
        result = db.compact_by_topic("test-topic")
        assert result["compacted"] >= 0

After removal, every one of these reported XFAIL. The 15 live untested methods got tests without xfail — they became the safety net for the rest of the refactoring.

The Dual Table

TDD Property DDD Equivalent
Red → Green → Refactor Green → xfail → Remove
Failing test = spec not yet met Passing test = code exists
Passing test = spec satisfied xfail = code removed
Regression = test goes red Re-introduction = test goes XPASS
Test is living documentation xfail is living demolition record

Install Nothing

DDD uses only pytest.mark.xfail — a built-in pytest feature. No plugins, no dependencies, no configuration. The protocol is pure methodology.

This is the lowest possible adoption barrier. If your project uses pytest, you can start using DDD today.


DDD was developed during the Ruach Tov project’s Phase 0 cleanup. The full methodology and its relationship to @fixed_by verification are documented at aiit.dev.