Skip to content
GitHub
Database Issues

Concurrent-action deadlock from FK-child INSERT before parent FOR UPDATE

Concurrent-action deadlock from FK-child INSERT before parent FOR UPDATE

Problem

A use case meant to serialize concurrent operations on a row (so one wins and the loser gets a clean conflict) instead deadlocks under true concurrency: both transactions abort-or-block and Postgres kills one with DeadlockDetected, which surfaces to the client as an unhandled 500 rather than the intended clean 409.

Symptoms:

  • The integration test that fires two concurrent requests (asyncio.gather over two threaded clients, two independent DB connections) sees a 500, not the expected {200, 409}.
  • psycopg.errors.DeadlockDetected: deadlock detected — Process A waits for ShareLock on transaction Y; blocked by Process B. Process B waits for ShareLock on transaction X; blocked by Process A while locking a tuple in the parent table.
  • Runs clean sequentially; only fails when the two transactions genuinely interleave.

Root cause

The use case wrote a child row whose FK points at the row it was about to lock, in this order:

  1. INSERT into the child table (approval_decisions), whose candidate_id FK references the parent (rules). A foreign-key INSERT takes a FOR KEY SHARE lock on the referenced parent row to keep it from being key-changed or deleted mid-insert.
  2. SELECT ... FOR UPDATE on that same parent row to transition its state.

KEY SHARE locks are mutually compatible, so two concurrent transactions both acquire step-1’s KEY SHARE on the same parent row. Then each tries to upgrade to FOR UPDATE in step 2 — and each upgrade must wait for the other’s KEY SHARE to release. Neither can proceed → mutual-upgrade deadlock. Postgres detects it and aborts one transaction with SQLSTATE 40P01.

A deadlock-detector abort is not a “clean conflict”: it’s nondeterministic about the victim, log-noisy, and wastes the work both transactions did before the abort.

Solution

Acquire the row’s strongest lock first. Reorder the use case so the SELECT ... FOR UPDATE transition happens before the FK-locking child INSERT:

# Before (deadlocks): child INSERT takes KEY SHARE, then FOR UPDATE upgrades
decision = await self.decision_repository.record_decision(candidate_id=..., ...) # FK -> KEY SHARE on rules row
enshrined = await self.rule_repository.enshrine_candidate(candidate_id=...) # SELECT ... FOR UPDATE -> upgrade
# After (serializes cleanly): FOR UPDATE first; the child INSERT's KEY SHARE is a
# weaker lock the same transaction already dominates
enshrined = await self.rule_repository.enshrine_candidate(candidate_id=...) # SELECT ... FOR UPDATE
decision = await self.decision_repository.record_decision(candidate_id=..., ...) # FK KEY SHARE, no upgrade

Now two concurrent callers serialize on the FOR UPDATE: the winner takes the lock, transitions the row, inserts its child, and commits; the loser blocks on FOR UPDATE until the winner commits, then its under-lock precondition re-check sees the row already advanced and raises a domain conflict (CandidateNotApprovableError) that the route maps to a clean 409. The loser’s whole transaction (including its child INSERT) rolls back, so exactly one child row survives.

Implementation notes

  • The fix is a pure reordering within one transaction. It does not change which rows commit together — decision + transition + audit rows still commit atomically (or all roll back).
  • The authoritative serialization point is the repository’s SELECT ... FOR UPDATE + precondition re-check under the lock. An unlocked pre-read for before_state / validation is fine, but it is not the race guard.
  • Only operations that take FOR UPDATE need reordering. A sibling use case that only inserts the child row (no FOR UPDATE transition) takes just KEY SHARE and never upgrades, so it cannot deadlock this way — leave it alone.

Prevention

Best practices

  • When a use case both locks a row (FOR UPDATE) and inserts a child that FKs to that row, take the strongest lock first. Never let an FK INSERT’s implicit KEY SHARE precede a FOR UPDATE on the same parent row.
  • Treat “serialize at the row level → one wins, other gets a clean conflict” as a claim that must be proven by a true-concurrency test, not a sequential replay.

Warning signs

  • A use case docstring promises row-level serialization but the handler records a child/audit/decision row before it locks the parent.
  • A concurrency test uses sequential calls (replay) rather than asyncio.gather over independent connections — it will pass while hiding the deadlock.
# Fire two genuinely-concurrent requests; assert clean serialization, not a deadlock.
status_a, status_b = await asyncio.gather(asyncio.to_thread(_post), asyncio.to_thread(_post))
assert 500 not in {status_a, status_b}, f"deadlock regression (lock-order): {(status_a, status_b)}"
assert {status_a, status_b} == {200, 409}
# Exactly one child row survived the loser's rollback — the authoritative check.
assert len(await _decisions(conn, candidate_id=candidate_id)) == 1

References

  • SPEC-435 (Stream F W3) — operator action handlers; merge ee750a9
  • src/spectral/worlds/application/enshrinement/use_cases.pyApproveCandidate / RejectCandidate lock-ordering note
  • apps/api/tests/integration/test_candidate_action_routes.py::test_concurrent_approve_serializes_one_wins_other_409
  • Postgres row-level lock compatibility: FOR KEY SHARE vs FOR UPDATE (a FOR UPDATE upgrade conflicts with another txn’s FOR KEY SHARE)