Building a Drupal contrib module with AI-assisted TDD

This module went through three complete rewrites with three different AI coding tools over 14 months. Each rewrite had to pass the same test suite and run in production. Here is what I learned.

The spec came first

Before any tool generated a line of code, I wrote a functional specification. It took longer than any single iteration of the implementation. That was the right trade-off.

Before starting iteration 3, I fed this prompt into ChatGPT Pro with deep research mode enabled. It ran for 8 minutes:

For codelift migration projects we will build a drupal module
"config deterministic uuid".

Objective: have a q&a session that results in a requirements document.
Focussing on functional requirements, complete automated test coverage
and on technical implementation.

Initial input

We need a reliable contrib module with automated test coverage. The main
purpose is to make sure that drupal config uuid are deterministic. This
is possible with uuid spec version 5. In php this is possible with the
package ramsey/uuid. We need full test coverage for the following
scenario outline:

DEEP RESEARCH MODE ON
1. All case coverage for New and Existing config (i.e. full CRUD)
2. Config object (example drupal View)
3. Simple config (has no uuid)
4. All other things in drupal that have Uuids and could interfere with
   this system (i.e. no side effect!)
5. Drush config:export, drush config:import
6. Expose tool to update existing uuid at the start of introducing the
   deterministic config uuid module to a codebase
DEEP RESEARCH MODE OFF

We already have the module working but the test coverage is not good
(task lookup and analyse module on drupal.org)
https://git.drupalcode.org/project/config_uuid_deterministic

Elaborate on various strategies where to init the module technically.
For example as decorator or by patching existing code. Research common
patterns and paradigms but only focus on those with a high likelihood
of success.

Why is this module needed: During codelift migrations drupal config is
created and updated in 2 distinct lifecycles:
1. Migrations generate config during site installs and migrations.
   This process reruns many times.
2. Post-migrations config is altered (updated, deleted, created).

Above interfere very bad when uuid are random. Above work together in
symbiosis when uuid are deterministic, because we can then leverage
drupal battle-tested config:export and config:import tools.

You are a seasoned drupal engineer, you like simple understandable DRY.
You are a seasoned quality assurance expert that knows about all details
for proving a system works as expected with zero regressions.

Lets start the question and answer session. After all questions are
answered, provide the REQUIREMENTS document for CodeLift drupal contrib
module: config_uuid_deterministic.

The Q&A session that followed produced MASTER-REQUIREMENTS.md.

The spec -- MASTER-REQUIREMENTS.md -- is 10 sections covering platform support, UUID generation rules, CRUD semantics, skip-lists, architecture strategies, and test coverage goals. Here are the section headers:

Purpose & context
Supported platforms (Drupal/PHP matrix, ramsey/uuid dependency)
High-level behaviour (deterministic config UUIDs, zero side effects)
Deterministic UUID specification
Scope of configuration handling (covered types, skip-lists, recursive traversal)
CRUD semantics & lifecycle behaviour
Integration with Drush & config tools
Technical architecture & module init strategies
Automated test strategy
Coverage & quality goals

The critical section is the deterministic UUID formula:

Root config entities:
  Name string: config:<collection>:<config_name>

Nested plugin instances:
  Name string: config:<collection>:<config_name>:<path>

Constraint: For the same (collection, config_name, path)
the UUID must be identical on all installs.

This formula is the contract. Every implementation had to satisfy it. Every test validates it. The spec did not change between iterations 2 and 3. The implementation changed; the contract did not.

The spec also documents five architectural strategies I evaluated, with explicit reasoning for why four were rejected. Strategy A (storage decorators) was chosen. Strategy D (override the uuid service) was rejected because it would affect content entity UUIDs. Strategy E (core patches) was rejected because this needed to ship as contrib. These decisions predated any AI involvement. They required understanding Drupal's config system internals, which no tool could substitute.

The development environment

The working directory is minimal:

config_uuid_deterministic_development/
├── web/modules/custom/config_uuid_deterministic/
├── scripts/run-phpunit-tests-native.sh
├── phpunit.xml
└── plan-do-check-action/docs/MASTER-REQUIREMENTS.md

The test runner is 12 lines of bash:

#!/bin/bash
# This script runs PHPUnit tests using native PHP (Valet/local).

# Get the directory where this script is located
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(dirname "$SCRIPT_DIR")"

cd "$PROJECT_ROOT" || exit 1

# Set environment variables for Valet/native testing
export SIMPLETEST_BASE_URL="https://config-uuid-deterministic-development.test"
export SIMPLETEST_DB="mysql://[email protected]:3306/config_uuid_deterministic_development_test"

# Run PHPUnit with absolute path to config
vendor/bin/phpunit -c "$PROJECT_ROOT/phpunit.xml" --testdox web/modules/custom/

Every AI tool ran this script. No special integration. No IDE plugins. No tool-specific prompts beyond the spec. The loop was always the same:

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│  Functional  │     │  AI Coding   │     │  Test Suite  │
│  Spec        │────▶│  Tool        │────▶│  (PHPUnit)   │
│  (human)     │     │  (generates) │     │  (validates) │
└──────────────┘     └──────────────┘     └──────────────┘
                           ▲                      │
                           │    RED / GREEN       │
                           └──────────────────────┘

The AI tool is interchangeable. The spec and the tests are not.

The three iterations

Iteration 1: codelift_deterministic_uuid -- Aider + OpenAI o1 (January 2025)

This was the first time an agentic coding tool solved bugs beyond my own ability. OpenAI's o1 reasoning model could trace through Drupal's config storage chain -- following which service writes when and in what order -- in ways I found difficult to hold in working memory. It produced a working prototype that I deployed on a production D7-to-D10 migration project. But test coverage was thin, and the implementation used hash_salt as the UUID namespace, making it site-specific and non-portable across environments.

Iteration 2: codelift_deterministic_config_uuid -- ChatGPT Pro + Aider (March 2025)

I wrote the core business logic in ChatGPT Pro's interface, then used Aider for file-level edits and refactoring. The architecture improved: generator, remapper, and storage layers were separated into distinct services. But it was still project-specific code. No contrib-quality test coverage. No CI pipeline. No documentation beyond inline comments.

Iteration 3: config_uuid_deterministic -- Claude Code (June 2025 - February 2026)

I started from the functional spec. The first instruction to the tool was to write failing tests -- 22 tests (19 kernel, 3 unit) covering every section of the spec. Then the cycle was mechanical: run the test script, read the red output, fix the code, run the test script again. This iteration took the longest calendar time but produced the only version worth publishing. 95.18% coverage. GitHub Actions CI running against PHP 8.1-8.3 and Drupal 9.4, 10, and 11. Published on drupal.org.

	Iteration 1	Iteration 2	Iteration 3
Tool	Aider + o1	ChatGPT Pro + Aider	Claude Code
Module name	`codelift_deterministic_uuid`	`codelift_deterministic_config_uuid`	`config_uuid_deterministic`
Tests	Minimal	Some	22 tests, 95.18% coverage
CI	None	None	GitHub Actions (PHP 8.1-8.3, Drupal 9.4/10/11)
Namespace	`hash_salt` (site-specific)	`hash_salt`	Nil UUID (portable)
Architecture	Monolithic	Separated layers	Decorator pattern with services
Status	Production prototype	Production use	drupal.org contrib

The progression was not about tools getting better. It was about the spec getting more complete and the tests getting more thorough. Iteration 3 succeeded because the requirements were finalized, not because the tool was superior.

The production feedback loop

The test suite did not emerge fully formed from the spec. Production use drove half of it.

On January 23, 2026, a migration reinstall on a project with long field names resulted in missing field data. The database schema existed, but the hashed table name no longer matched the UUID in the field storage config. Drupal hashes table names that exceed 48 characters by appending a 10-character hash derived from SHA-256(uuid). Change the UUID and the hash changes. The hash changes and Drupal looks for a table that does not exist.

The first thing I did was write a failing test:

UUID should NOT be remapped for field storage with hashed table names

The test creates a field storage with a name long enough to trigger table name hashing, creates content using that field, then enables the module. If the module remaps the UUID, the hashed table name changes and the content becomes inaccessible. The assertion is simple: after enabling the module, the UUID must remain unchanged and the field data must still be readable.

The commit chain shows the TDD cycle:

a6d1b89 (2026-01-23) -- Failing test proving hashed table UUID mismatch bug
3358ca4 (2026-01-24) -- First fix: skip-list check in hook_entity_presave
ae04ab7 (2026-01-25) -- ConfigTableMismatchDetector service
ba026fe (2026-02-26) -- Final fix: context-aware skip logic

Four commits over 34 days. The first fix was a skip-list -- a hardcoded list of configs to leave alone. The second approach extracted the detection into a dedicated service. The final fix implemented context-aware logic: file storage (no database) always normalizes; active storage (has database) checks for existing hashed tables before normalizing.

This sequence proves something I think matters: production use drove the test suite, not the other way around. No spec, no matter how thorough, would have predicted the hashed table edge case. It required a real project with a real field name that happened to exceed 48 characters when prefixed with the entity type and _revision__. The test exists because a migration broke.

Where AI helped and where it did not

Where AI excelled

Generating boilerplate test scaffolding. Kernel test base classes in Drupal require entity schema installation, module dependency declarations, and configuration directory setup. This is well-documented work that follows patterns. Every AI tool handled it correctly.

Implementing well-specified algorithms. The UUIDv5 generator, the recursive array walker for nested plugin detection, the config name string builder -- these are functions with clear inputs, clear outputs, and clear constraints. Given the spec, any competent tool produced correct implementations on the first or second attempt.

Refactoring across multiple files simultaneously. Renaming a service, updating all injection points, adjusting test references -- this is mechanical work that AI tools do faster and more reliably than manual editing.

Where AI failed or needed heavy guidance

Understanding Drupal's config storage lifecycle was the persistent gap. Which service writes when. What order hooks fire in. How ConfigImporter differs from ConfigInstaller. How site:install bootstraps config before contributed modules load. No AI tool had a reliable model of these interactions. I corrected sequencing assumptions in every iteration.

The hashed table edge case came entirely from production. No AI tool predicted it. No amount of prompting would have surfaced it. It required running real migrations with real data on a real codebase.

Service provider wiring was consistently fragile. Drupal's dependency injection patterns diverge from standard Symfony in ways that AI tools do not model well. In an early iteration, the AI suggested replacing config.factory entirely instead of decorating the storage services. This would have broken config entity dependency resolution across core. The fix was not a small adjustment -- it required understanding why Drupal separates config.factory from config.storage and which one is safe to intercept.

Where the human was irreplaceable

Writing the functional specification. Knowing what the module needed to do, which use cases it served, which architectural patterns were safe and which were not. No tool contributed to this.

Designing the test cases. Knowing what to test is harder than writing the test. The 22 tests cover config CRUD, nested plugin remapping, skip-list behavior, Drush integration, and the hashed table edge case. The coverage map came from understanding Drupal's config system, not from generating code.

Discovering edge cases through production use. The spec predicted some failure modes. Production revealed others. No test suite can anticipate what production reveals.

Making architectural decisions. Decorators vs. service replacement. Nil UUID vs. hash_salt. Which hooks to use. When to skip normalization. These are judgment calls that require knowing Drupal's internals well enough to predict downstream effects.

The tools accelerated execution, not understanding.

Closing

The module is available on drupal.org: config_uuid_deterministic. The development environment and full commit history are on GitLab: config_uuid_deterministic_development.

The methodology -- spec first, failing tests, iterate with whatever tool is best -- is the part worth keeping.

Read the full series: Part 1: The Failure | Part 2: The Engineering