Concurrent-action deadlock from FK-child INSERT before parent FOR UPDATE
Concurrent-action deadlock from FK-child INSERT before parent FOR UPDATE
Problem
A use case meant to serialize concurrent operations on a row (so one wins and the
loser gets a clean conflict) instead deadlocks under true concurrency: both
transactions abort-or-block and Postgres kills one with DeadlockDetected, which
surfaces to the client as an unhandled 500 rather than the intended clean 409.
Symptoms:
- The integration test that fires two concurrent requests (
asyncio.gatherover two threaded clients, two independent DB connections) sees a 500, not the expected{200, 409}. psycopg.errors.DeadlockDetected: deadlock detected — Process A waits for ShareLock on transaction Y; blocked by Process B. Process B waits for ShareLock on transaction X; blocked by Process Awhile locking a tuple in the parent table.- Runs clean sequentially; only fails when the two transactions genuinely interleave.
Root cause
The use case wrote a child row whose FK points at the row it was about to lock, in this order:
INSERTinto the child table (approval_decisions), whosecandidate_idFK references the parent (rules). A foreign-key INSERT takes aFOR KEY SHARElock on the referenced parent row to keep it from being key-changed or deleted mid-insert.SELECT ... FOR UPDATEon that same parent row to transition its state.
KEY SHARE locks are mutually compatible, so two concurrent transactions both
acquire step-1’s KEY SHARE on the same parent row. Then each tries to upgrade
to FOR UPDATE in step 2 — and each upgrade must wait for the other’s KEY SHARE to release. Neither can proceed → mutual-upgrade deadlock. Postgres
detects it and aborts one transaction with SQLSTATE 40P01.
A deadlock-detector abort is not a “clean conflict”: it’s nondeterministic about the victim, log-noisy, and wastes the work both transactions did before the abort.
Solution
Acquire the row’s strongest lock first. Reorder the use case so the
SELECT ... FOR UPDATE transition happens before the FK-locking child INSERT:
# Before (deadlocks): child INSERT takes KEY SHARE, then FOR UPDATE upgradesdecision = await self.decision_repository.record_decision(candidate_id=..., ...) # FK -> KEY SHARE on rules rowenshrined = await self.rule_repository.enshrine_candidate(candidate_id=...) # SELECT ... FOR UPDATE -> upgrade
# After (serializes cleanly): FOR UPDATE first; the child INSERT's KEY SHARE is a# weaker lock the same transaction already dominatesenshrined = await self.rule_repository.enshrine_candidate(candidate_id=...) # SELECT ... FOR UPDATEdecision = await self.decision_repository.record_decision(candidate_id=..., ...) # FK KEY SHARE, no upgradeNow two concurrent callers serialize on the FOR UPDATE: the winner takes the
lock, transitions the row, inserts its child, and commits; the loser blocks on
FOR UPDATE until the winner commits, then its under-lock precondition
re-check sees the row already advanced and raises a domain conflict
(CandidateNotApprovableError) that the route maps to a clean 409. The loser’s
whole transaction (including its child INSERT) rolls back, so exactly one child
row survives.
Implementation notes
- The fix is a pure reordering within one transaction. It does not change which rows commit together — decision + transition + audit rows still commit atomically (or all roll back).
- The authoritative serialization point is the repository’s
SELECT ... FOR UPDATE+ precondition re-check under the lock. An unlocked pre-read forbefore_state/ validation is fine, but it is not the race guard. - Only operations that take
FOR UPDATEneed reordering. A sibling use case that only inserts the child row (noFOR UPDATEtransition) takes justKEY SHAREand never upgrades, so it cannot deadlock this way — leave it alone.
Prevention
Best practices
- When a use case both locks a row (
FOR UPDATE) and inserts a child that FKs to that row, take the strongest lock first. Never let an FK INSERT’s implicitKEY SHAREprecede aFOR UPDATEon the same parent row. - Treat “serialize at the row level → one wins, other gets a clean conflict” as a claim that must be proven by a true-concurrency test, not a sequential replay.
Warning signs
- A use case docstring promises row-level serialization but the handler records a child/audit/decision row before it locks the parent.
- A concurrency test uses sequential calls (replay) rather than
asyncio.gatherover independent connections — it will pass while hiding the deadlock.
Related tests
# Fire two genuinely-concurrent requests; assert clean serialization, not a deadlock.status_a, status_b = await asyncio.gather(asyncio.to_thread(_post), asyncio.to_thread(_post))assert 500 not in {status_a, status_b}, f"deadlock regression (lock-order): {(status_a, status_b)}"assert {status_a, status_b} == {200, 409}# Exactly one child row survived the loser's rollback — the authoritative check.assert len(await _decisions(conn, candidate_id=candidate_id)) == 1References
- SPEC-435 (Stream F W3) — operator action handlers; merge
ee750a9 src/spectral/worlds/application/enshrinement/use_cases.py—ApproveCandidate/RejectCandidatelock-ordering noteapps/api/tests/integration/test_candidate_action_routes.py::test_concurrent_approve_serializes_one_wins_other_409- Postgres row-level lock compatibility:
FOR KEY SHAREvsFOR UPDATE(aFOR UPDATEupgrade conflicts with another txn’sFOR KEY SHARE)