Drupal Migration

Making Drupal config UUIDs deterministic: storage decorators, UUIDv5, and edge cases

UUIDv5 with config name as input. Storage decorators that intercept every write path. A hashed-table edge case that almost caused data loss. Here is how the module works.

In the previous post, I described the problem: random config UUIDs make iterative migration diffs useless. The constraint is strict: make config UUIDs deterministic without patching core or affecting content entities. Here is how I built it.

Why UUIDv5

Before landing on the final approach, I evaluated and rejected four alternatives:

Strategy Problem
Override Uuid::generate() Affects content entities (nodes, users, media). Those must stay random.
Post-save hook rewrite Race condition with config sync. UUID is already written before the hook fires.
Custom config entity base class Requires patching every config entity type. Unmaintainable.
Replace config.factory Too broad. Breaks assumptions across core subsystems.
UUIDv5 with config name as input Deterministic. Scoped to config only. No core patches.

UUID version 5 is name-based. You feed it a namespace and a name string, and it always produces the same UUID for the same input. The name string format:

config:<collection>:<config_name>

A real example: config:default:block.block.claro_page_title always produces ab0d2010-dc70-57f1-a4a8-590cc1463e7c. Same input, same output. On your laptop, in CI, on a fresh install six months from now.

The namespace is the nil UUID: 00000000-0000-0000-0000-000000000000. This is deliberate. The namespace is not tied to hash_salt or any site-specific secret. Determinism must hold across fresh installs, CI pipelines, and developer machines. A site-specific namespace would defeat the entire purpose.

Here is the generator:

class ConfigUuidGenerator {

  public const NAMESPACE = '00000000-0000-0000-0000-000000000000';

  public function generate(string $collection, string $name, ?string $path = NULL): string {
    $nameString = "config:{$collection}:{$name}";
    if ($path !== NULL) {
      $nameString .= ":{$path}";
    }

    return Uuid::uuid5(self::NAMESPACE, $nameString)->toString();
  }

}

A note on SHA-1. UUIDv5 uses SHA-1 internally, and someone will bring this up. The concern is misplaced here. SHA-1 collision attacks require adversarial input crafting over a massive input space. Our input space is Drupal config names -- finite, controlled, and defined by module code. We are not using SHA-1 for cryptographic guarantees. We are using it for determinism. The collision probability across the set of config names in any Drupal installation is effectively zero.

The storage decorator pattern

Drupal's config system reads and writes through storage services. config.storage.active wraps the database. config.storage.sync wraps the file system. Every drush cex, drush cim, module enable, and site:install flows through these services.

The module decorates both:

┌─────────────────────────────────────────────────┐
│                  Drupal Core                     │
│                                                  │
│  config.storage.active    config.storage.sync    │
│         │                        │               │
│         ▼                        ▼               │
│  ┌──────────────┐    ┌───────────────────────┐   │
│  │ DatabaseStorage│    │ FileStorage           │   │
│  └──────────────┘    └───────────────────────┘   │
└─────────────────────────────────────────────────┘

             ▼ module enabled ▼

┌─────────────────────────────────────────────────┐
│          config_uuid_deterministic               │
│                                                  │
│  config.storage.active    config.storage.sync    │
│         │                        │               │
│         ▼                        ▼               │
│  ┌──────────────────┐  ┌─────────────────────┐   │
│  │ Deterministic     │  │ FileStorage         │   │
│  │ ActiveStorage     │  │ Deterministic       │   │
│  │  (extends DB)     │  │  (extends File)     │   │
│  └──────────────────┘  └─────────────────────┘   │
│         │                        │               │
│         ▼                        ▼               │
│  ┌──────────────────────────────────────────┐    │
│  │     ConfigUuidGenerator (UUIDv5)         │    │
│  │     ConfigUuidRemapper (nested plugins)  │    │
│  └──────────────────────────────────────────┘    │
└─────────────────────────────────────────────────┘

The storage classes override write() and read(). On write, the UUID is computed from the config name and injected before the data reaches the underlying storage. On read, the same normalization happens so that even if the database contains a random UUID, the caller always sees the deterministic one.

This matters because config enters the system through multiple paths:

  • ConfigImporter writes to active storage during drush cim. The decorator intercepts this.
  • ConfigInstaller writes during module enable and profile install. The decorator intercepts this.
  • site:install seeds config before any contributed module can hook in. The decorator intercepts this too, because it replaces the storage service definition itself.

The sync storage uses a ServiceProvider to swap the factory:

class ConfigUuidDeterministicServiceProvider extends ServiceProviderBase {

  public function alter(ContainerBuilder $container) {
    $service_name = 'config.storage.sync';
    if ($container->hasDefinition($service_name)) {
      $definition = $container->getDefinition($service_name);
      if ($definition->getFactory() === [FileStorageFactory::class, 'getSync']) {
        $definition->setFactory([ConfigSyncStorageFactory::class, 'getSync']);
      }
    }
  }

}

Why storage extension instead of a broader approach like replacing config.factory or decorating ConfigEntityBase? Three reasons.

First, it preserves all existing storage behavior. The decorators only add UUID normalization to write() and read(). Every other operation -- listAll(), delete(), rename() -- passes through unchanged.

Second, the module can be enabled and disabled cleanly. No config entities are modified in a way that leaves orphaned state. Disable the module and you are back to random UUIDs on next save.

Third, it does not touch content entities. The uuid service that generates UUIDs for nodes, users, and media remains untouched. Only config storage is intercepted.

The nested plugin problem

Config entities have a problem that simple config does not: nested plugins. Image style effects, view display handlers, text format filters, layout builder components. These are plugin instances stored inside the parent config, each with its own UUID used as both the YAML mapping key and the identity value.

Here is image.style.thumbnail from a real migration project:

effects:
  c5973c87-e49d-57c9-8e42-5e8e7e29dba2:
    uuid: c5973c87-e49d-57c9-8e42-5e8e7e29dba2
    id: image_scale
    weight: 0
    data:
      width: 100
      height: 100
      upscale: true
  28b42f10-08b4-512e-ab8a-251e7c5354cb:
    uuid: 28b42f10-08b4-512e-ab8a-251e7c5354cb
    id: image_crop
    weight: 2

The UUID appears twice for each plugin: as the YAML mapping key and as the uuid value inside. Both must change together. If they diverge, Drupal cannot load the plugin instance.

The ConfigUuidRemapper walks the config array recursively. Detection rule: any array element containing both id and uuid keys is a plugin instance.

public static function remapNested(array $data, string $configName, array $pathSegments = []): array {
  $out = [];

  foreach ($data as $key => $value) {
    if (is_array($value) && isset($value['id'], $value['uuid'])) {
      $newPath = array_merge($pathSegments, [$value['id']]);
      $stableName = $configName . '|' . implode('|', $newPath);
      $newUuid = Uuid::uuid5(ConfigUuidGenerator::NAMESPACE, $stableName)->toString();
      $value['uuid'] = $newUuid;
      $out[$newUuid] = self::remapNested($value, $configName, $newPath);
    }
    elseif (is_array($value)) {
      $newPath = array_merge($pathSegments, [$key]);
      $out[$key] = self::remapNested($value, $configName, $newPath);
    }
    else {
      $out[$key] = $value;
    }
  }

  return $out;
}

The path-based naming uses the plugin's id field, not the UUID key. For the image_scale effect in image.style.thumbnail, the name string is image.style.thumbnail|effects|image_scale. The id field is stable -- it comes from the plugin definition, not from randomness. As long as the config structure is stable, the deterministic UUID is stable.

This handles every nested plugin pattern in Drupal core: view filters and sort handlers, image style effects, text format filters, editor settings, and layout builder section components.

The hashed table trap

This is the edge case that almost broke the module.

Field storage configs create database tables. field.storage.node.field_name produces tables named node__field_name for data and node_revision__field_name for revisions. Standard Drupal behavior.

But Drupal has a table name length limit: 48 characters for its internal threshold. When a table name exceeds this, Drupal truncates the name and appends a 10-character hash derived from the field storage UUID. The hash function: substr(hash('sha256', $uuid), 0, 10).

Change the UUID and the hash changes. The hash changes and the table name changes. The table name changes and Drupal looks for a table that does not exist.

Here is the real scenario from the test suite:

Data table: taxonomy_term__field_long_enough_for_hash (41 chars) -- not hashed
Revision table: taxonomy_term_revision__field_long_enough_for_hash (50 chars) > 48 -- HASHED

The data table is 41 characters. Fine. The revision table is 50 characters. Exceeds the limit. Drupal hashes it to something like taxonomy_term_r__a3f8b2c1d9. The a3f8b2c1d9 comes from SHA-256 of the current UUID.

Sidenote: I filed #3223605 against Drupal core in 2021, asking to increase the 32-character limit on field machine names. I had no idea at the time that the table name hashing triggered by long names would come back to haunt me four years later in an entirely different context.

Now enable the module. The storage decorator sees field.storage.taxonomy_term.field_long_enough_for_hash, computes a deterministic UUID, and writes it. The hash changes to, say, e7d4f1a2b3. Drupal now expects the revision table to be taxonomy_term_r__e7d4f1a2b3. That table does not exist. The real table is still taxonomy_term_r__a3f8b2c1d9.

The failure modes:

  • Schema mismatch on entity load
  • Missing field tables
  • Silent data disappearance when loading entities with that field
  • Fatal SQL errors: SQLSTATE[42S02]: Base table or view not found

This is not cosmetic. This is data loss.

The solution is context-aware skip logic, implemented in the FieldTableNameChecker trait that both storage classes use:

File storage context (no database access): always normalize. This is safe because no tables exist to break. The sync directory should always contain deterministic UUIDs.

Active storage context (has database access): before normalizing, check if hashed tables exist with the current UUID. If they do, skip the normalization for this config. Log a warning. Protect existing data.

protected function hasConflictingHashedTables(string $configName, array $data, $database = NULL): bool {
  if (!$this->wouldUseHashedTableName($configName, $data)) {
    return FALSE;
  }

  if ($database === NULL) {
    return FALSE;
  }

  $currentUuid = $data['uuid'];
  $deterministicUuid = Uuid::uuid5('00000000-0000-0000-0000-000000000000', $configName)->toString();

  if ($currentUuid === $deterministicUuid) {
    return FALSE;
  }

  $entityType = substr($data['entity_type'], 0, 32);
  $currentHash = substr(hash('sha256', $currentUuid), 0, 10);

  $schema = $database->schema();
  $revisionTableHashed = $entityType . '_r__' . $currentHash;

  if ($schema->tableExists($revisionTableHashed)) {
    return TRUE;
  }

  return FALSE;
}

hook_field_storage_config_create(): set the deterministic UUID before Drupal creates the table. This prevents the problem entirely on new installs. When the field storage entity is being created for the first time, no tables exist yet. Set the deterministic UUID at creation time and the hash is computed from the deterministic UUID from the start. No conflict possible.

function config_uuid_deterministic_field_storage_config_create(FieldStorageConfigInterface $entity) {
  $config_name = $entity->getEntityType()->getConfigPrefix() . '.' . $entity->id();

  if ($config_name === 'system.site') {
    return;
  }

  $deterministic_uuid = Uuid::uuid5('00000000-0000-0000-0000-000000000000', $config_name)->toString();
  $entity->set('uuid', $deterministic_uuid);
}

Three layers of protection for one edge case. It took a failing kernel test to surface it, and four more commits to get the fix right.

Drush commands

drush cud:normalize --dry-run                     # Preview UUID changes without writing
drush cud:normalize --pattern="views.view.*"      # Normalize specific config patterns
drush cud:normalize --include-sync                # Also normalize the sync directory

The --dry-run flag is important. On an existing site with field storage configs that have hashed tables, you want to see which configs will be skipped before making changes. The normalizer reports every config name, the paths that changed, and whether UUIDs were actually rewritten.

Closing

The module is available on drupal.org: config_uuid_deterministic. The development environment and full commit history are on GitLab: config_uuid_deterministic_development.

Next: Part 3: Building a Drupal contrib module with AI-assisted TDD -- a functional specification written before any code, failing tests as the starting point for every feature, and three different AI coding tools used across the development timeline.

Niels de Feyter

Niels de Feyter

Founder CodeLift

LinkedIn

Get a Clear Upgrade Roadmap

Book a free 30-minute call. I will assess your site and give you a clear upgrade path — no obligations, no sales pitch.