Runbooks

State Lock Recovery

How to identify and resolve Terraform state lock errors.

Production

Symptoms

Error: Error acquiring the state lock

This means another Terraform operation is holding the state lock, or a previous operation crashed without releasing it.

Investigation

1. Check if Another Operation is Running

Verify no other terraform apply or terraform plan is currently running in another terminal, CI pipeline, or by another team member.

2. View Lock Information

task infra:show-lock

This scans the ontopix-tflocks DynamoDB table and shows any active locks, including:

  • Lock ID
  • Who holds the lock
  • When it was acquired
  • Operation in progress

3. Determine if the Lock is Stale

A lock is stale if:

  • The process that acquired it has crashed or been terminated
  • The timestamp is old (more than a few minutes for plan, more than 10 minutes for apply)
  • You can confirm no one else is running Terraform operations

Resolution

If Another Operation is Running

Wait for it to complete. Do not force-unlock while another operation is in progress — this can corrupt the state file.

If the Lock is Stale

Force-unlock using the lock ID from step 2:

LOCK_ID=<lock-id-from-show-lock> task infra:force-unlock

After Unlocking

Run a plan to verify state is consistent:

task infra:plan

If the plan shows unexpected changes, see Drift Investigation.