concurrency-race-analysis - SKILL.md Agent Skill

name: concurrency-race-analysis description: Detects and reasons about data races, deadlocks, livelocks, and memory-ordering bugs in concurrent and parallel code. Use when reviewing multithreaded code, lock-free structures, async pipelines, or when symptoms include intermittent corruption, heisenbugs, hangs, or load-dependent failures.

Concurrency & Race Analysis

Treat every shared mutable location as a potential race until proven otherwise. The bug is almost never where the crash surfaces.

Build the interleaving model first

Before reading the locking code, enumerate the actors and the shared state:

Shared cells: every variable, field, or resource touched by >1 thread/task. Mark read-only vs read-write.
Happens-before edges: what establishes ordering? (lock acquire/release, channel send/recv, join, atomic with acquire/release, memory fence). Absence of an edge = reorderable.
Atomicity scope: which groups of operations must be indivisible. Most bugs are "each op is atomic, but the composite isn't" (check-then-act, read-modify-write).

Failure-mode taxonomy (check each explicitly)

Data race: unsynchronized access where ≥1 is a write. Under a relaxed memory model this is UB, not just a stale read.
Atomicity violation: TOCTOU, lost update, double-checked locking without proper fences.
Order violation: B assumes A already ran (init-before-use across threads).
Deadlock: cyclic lock acquisition. Map the lock-order graph; any cycle is a latent deadlock.
Livelock / starvation: progress without forward progress; unfair locks; backoff that resonates.
ABA: pointer/counter reused between read and CAS. Needs tagged pointers or hazard pointers.
Memory visibility: write on T1 never observed by T2 due to missing acquire/release.

Adversarial validation

For each shared cell, construct the worst interleaving by hand:

Preempt the thread between a read and its dependent write.
Run two writers simultaneously; is the result one of the inputs or garbage?
Assume the compiler/CPU reorders any two operations not separated by a fence. Does an invariant break?
For lock-free code: assume a thread stalls indefinitely mid-operation. Can others still make progress and stay correct?

Non-obvious traps

A single volatile/atomic makes one access atomic but does not make a multi-step invariant atomic.
Holding a lock while calling out to unknown code (callbacks, virtual dispatch) invites lock-order inversion.
"Mostly works under load testing" is the signature of a race, not evidence of correctness — interleavings are sampled, not exhausted.
Fixing by adding sleeps/retries hides the race and shifts it under timing pressure.

Output structure

For each finding report: shared cell → unsynchronized access pattern → concrete interleaving that breaks it → invariant violated → minimal fix (narrow the critical section, pick the right atomic ordering, or redesign to eliminate sharing). Prefer eliminating shared mutable state over adding locks.