name: sqldeadlock-review description: Analyze SQL Server deadlock XML (from system_health XE session, SSMS deadlock graph, or trace) to identify root cause and produce a prioritized remediation plan. Applies 16 known deadlock patterns (P1–P16). Use when a deadlock monitor captures a graph or users report intermittent deadlock errors (error 1205). triggers: - /sqldeadlock-review - /deadlock - /deadlock-analyze
SQL Server Deadlock Analysis Skill
Purpose
Parse a SQL Server deadlock XML graph, identify the victim and winner processes, extract the queries and lock acquisition patterns involved, match against 16 known deadlock patterns (P1–P16), and produce a prioritized remediation plan.
Input
Accept any of:
- Raw
<deadlock>XML (from system_health XE session or SSMS deadlock graph Save As XML) - A file path to a
.xdlor.xmldeadlock graph file - A description of the deadlock if XML is not available
How to Run
- Parse the XML structure
- Extract process list (victim, winner, their queries, lock waits)
- Extract resource list (what locks are held and requested)
- Match against pattern library
- Generate remediation recommendations
XML Structure Reference
<deadlock>
<victim-list>
<victimProcess id="process1a2b" />
</victim-list>
<process-list>
<process id="process1a2b" taskpriority="0" logused="0"
waitresource="KEY: 5:72057594038910976 (abc123)"
waittime="4023" ownerId="123456"
transactionname="user_transaction"
currentdb="5" spid="52" kpid="1234"
status="suspended" isolationlevel="read committed">
<executionStack>
<frame procname="adhoc" line="3" stmtstart="100" stmtend="200"
sqlhandle="0x...">
UPDATE Orders SET Status = 1 WHERE Id = @id
</frame>
</executionStack>
<inputbuf>UPDATE Orders SET Status = 1 WHERE Id = @id</inputbuf>
</process>
...
</process-list>
<resource-list>
<keylock hobtid="72057594038910976" dbid="5" objectname="dbo.Orders"
indexname="PK_Orders" id="lock1" mode="X" associatedObjectId="...">
<owner-list>
<owner id="process2c3d" mode="X" />
</owner-list>
<waiter-list>
<waiter id="process1a2b" mode="U" requestType="wait" />
</waiter-list>
</keylock>
...
</resource-list>
</deadlock>
Extraction Checklist
For each process:
- Process ID, SPID, victim status (yes/no)
- Query text (from
<inputbuf>and<executionStack>) waitresource— what lock it is waiting fortransactionname— the transaction contextisolationlevel— READ COMMITTED, SNAPSHOT, SERIALIZABLE, etc.logused— how much log has been written (indicator of transaction size)
For each resource:
- Resource type:
keylock,pagelock,objectlock,ridlock,metadatalock - Object and index name
- Mode held by each owner (S, U, X, IS, IX, SIX)
- Mode requested by each waiter
Pattern Library (P1–P16)
P1 — Classic Forward/Reverse Access Order
- Signature: Process A holds X on resource R1, waits for resource R2. Process B holds X on R2, waits for R1.
- Severity: High
- Cause: Two transactions update the same pair of rows in opposite order.
- Fix: Enforce a consistent access order in application code (always update table A before table B, always process rows in ascending PK order).
P2 — Reader/Writer Deadlock (Shared vs Exclusive)
- Signature: Process A holds S lock (SELECT), waits for X. Process B holds X (UPDATE), waits for S to be released.
- Severity: High
- Cause: A long-running read transaction blocks a writer; another reader prevents the writer from completing, causing a cycle.
- Fix: Enable READ_COMMITTED_SNAPSHOT isolation (
ALTER DATABASE ... SET READ_COMMITTED_SNAPSHOT ON). Readers take no shared locks under RCSI — the most common fix for reader/writer deadlocks without changing application code.
P3 — Update Lock Escalation Deadlock
- Signature: Multiple processes hold U locks on different rows, each waiting for U on the other's row.
- Severity: High
- Cause:
UPDATEstatements taking U locks in different orders on the same table. - Fix: Add an index on the
WHEREclause columns so each update targets exactly one row (reduces lock scope). Consider usingWITH (ROWLOCK)hint. Consistent access order also applies.
P4 — Missing Index Causing Page Lock Escalation
- Signature:
objectlockorpagelockresource type (notkeylock) in the resource list. - Severity: High
- Cause: Without a row-level index, SQL Server takes page or table locks. Multiple transactions competing for the same page deadlock each other.
- Fix: Add a nonclustered index on the filter column so SQL Server takes row-level (
keylock) locks instead of page locks. Use thesqlindex-advisorskill if an execution plan is available.
P5 — Bookmark Lookup Deadlock (Key Lookup)
- Signature: Two
keylockresources: one on a nonclustered index, one on the clustered index (PK). Process A holds lock on NC index, waits for PK. Process B holds lock on PK, waits for NC index. - Severity: Medium
- Cause: A query does a Key Lookup (NC index → PK), taking locks on both. Another query updates via the PK, taking locks in reverse order.
- Fix: Eliminate the Key Lookup by adding INCLUDE columns to the NC index so no bookmark lookup is needed. This removes the two-resource lock acquisition.
P6 — SERIALIZABLE Phantom Deadlock
- Signature:
isolationlevel = serializableon one or more processes AND range locks (RangeX-X, RangeS-U) visible in the resource type. - Severity: Medium
- Cause: SERIALIZABLE isolation holds range locks to prevent phantoms. Two transactions holding range locks on adjacent ranges block each other's inserts.
- Fix: Downgrade to SNAPSHOT isolation if application semantics allow. SNAPSHOT uses optimistic row versioning and eliminates range locks entirely. If SERIALIZABLE is required, reduce transaction scope.
P7 — Foreign Key Check Deadlock
- Trigger:
objectnamein the resource list references a parent table, and one process is inserting into the child table while another deletes from the parent. - Severity: Medium
- Cause: Inserting into a child table takes a shared lock on the parent (FK validation). Deleting from the parent takes an exclusive lock. If done concurrently in opposite order, deadlock occurs.
- Fix: Add an index on the FK column in the child table (prevents table scan during FK validation). Ensure parent deletes and child inserts do not overlap in concurrent transactions.
P8 — Self-Deadlock (Single Process)
- Signature:
victim-listandprocess-listcontain only one process ID. - Severity: Medium
- Cause: A single SPID is requesting a lock it already holds in an incompatible mode (rare, usually triggered by cursors or certain MERGE statements).
- Fix: Rewrite the query to avoid cursor-based row-by-row processing. Review MERGE statements for self-deadlock edge cases documented in KB articles.
P9 — RCSI Reader Deadlock Despite RCSI Enabled
- Signature:
READ_COMMITTED_SNAPSHOTis ON for the database yet the deadlock involves a reader (S lock) and a writer (X lock) in a cycle. - Severity: High
- Cause: One or more sessions is using an isolation level that still takes shared locks despite RCSI — REPEATABLE READ, SERIALIZABLE, or an explicit
WITH (HOLDLOCK)/WITH (UPDLOCK)hint overrides RCSI for that statement. RCSI only removes S locks for READ COMMITTED; higher isolation levels retain them. - Fix: Identify the isolation level of the reader process (
isolationlevelattribute). If REPEATABLE READ or SERIALIZABLE is not required by the application, downgrade to READ COMMITTED. Remove unnecessaryWITH (HOLDLOCK)hints.
P10 — MERGE Statement Deadlock
- Signature: One or more processes has a MERGE statement in its
<executionStack>as the active frame. - Severity: High
- Cause: MERGE uses Halloween Protection — it must read all matching rows before writing any, which creates an internal spool. This spool acquires a mix of S and X locks in a pattern that can cycle with concurrent MERGE or DML on the same table.
- Fix: Replace MERGE with explicit INSERT/UPDATE/DELETE statements. If MERGE is required, add
WITH (TABLOCK)on the target as a short-term workaround (serializes all MERGE operations). Longer term, partition the target table or pre-stage source data to reduce concurrent overlap.
P11 — Heap Table RID Lock Deadlock
- Signature: Resource list contains
ridlockentries instead ofkeylockentries. - Severity: High
- Cause: The table has no clustered index (heap). SQL Server uses Row ID (RID) locks on heap pages. RID locks are coarser than key locks and more prone to conflicts because multiple rows share a page granularity boundary.
- Fix: Add a clustered index to convert the heap to a B-tree table. Row-level key locks replace RID locks, reducing deadlock surface. Use
/sqlindex-advisorto identify the best clustering key.
P12 — Distributed Transaction Deadlock
- Signature: One or more processes has
transactionnamecontaining "Distributed Transaction" ordtcStateattribute present. - Severity: High
- Cause: MS DTC coordinates a distributed transaction spanning multiple SQL Server instances or resource managers. Distributed transactions hold locks for the full duration of the two-phase commit protocol, which is significantly longer than local transactions. The extended lock hold time dramatically increases deadlock probability.
- Fix: Eliminate distributed transactions where possible by co-locating data on a single instance. If DTC is required, minimize transaction scope — commit or roll back as quickly as possible. Ensure DTC timeout settings are not artificially extended.
P13 — Multiple Deadlock Graphs: Shared Root Cause
- Signature: Multiple deadlock XML graphs are provided and all share the same
objectnameorindexnamein their resource lists. - Severity: High
- Cause: A single table or index is the hotspot for all deadlocks. This is not a set of independent incidents — it is one recurring access pattern that fires repeatedly under concurrency.
- Fix: Focus exclusively on the shared table. Apply the fix pattern appropriate for the individual deadlock type (P1–P12) detected in each graph. A single index addition or isolation level change to the shared table will resolve all graphs simultaneously.
P14 — TempDB Resource Deadlock
- Signature:
objectnamein the resource list is a tempdb object (tempdb.dbo.#tempor a system page like GAM, PFS, SGAM). - Severity: High
- Cause: Concurrent DDL on temp tables (CREATE/DROP) under high parallelism contends on TempDB allocation pages. Multiple sessions creating and dropping temp objects simultaneously fight over PFS and GAM pages. User temp table DML can also deadlock when two sessions update the same temp table rows.
- Fix: For allocation page contention, increase TempDB file count to match CPU count (up to 8 files), enable TF 1118 (SQL 2014–), or configure TempDB appropriately for the SQL Server version. For user temp table deadlocks, apply the same lock-order fixes as P1.
P15 — Lock Escalation Deadlock
- Signature: Resource list contains an
objectlock(table-level lock) alongsidekeylockorridlockentries on the same table, held by different sessions. - Severity: High
- Cause: SQL Server escalated row locks to a table lock for one session (threshold: 5,000 locks per table or per transaction). The table lock conflicts with row-level locks held by another concurrent session.
- Fix: Prevent escalation with
ALTER TABLE ... SET (LOCK_ESCALATION = DISABLE)if read isolation allows. Alternatively, reduce transaction size so the 5,000-lock threshold is never reached. Enable RCSI to eliminate S locks from readers, reducing total lock count.
P16 — Ledger or Temporal History Table Deadlock
- Signature:
objectnamein the resource list references a ledger history table (namedMSSQL_LedgerHistoryFor_...) or a temporal history table. Applies to SQL 2016+ (temporal) and SQL 2022+ (ledger). - Severity: Medium
- Cause: Ledger and temporal tables maintain hidden history rows on UPDATE/DELETE. The system inserts history rows into a separate table, acquiring locks in an order that can conflict with explicit DML on the base table from another session.
- Fix: Ensure application code does not hold locks on the base table for extended periods (keep transactions short). If possible, route read queries against the history table to a secondary replica to reduce read/write contention on the primary.
Lock Compatibility Reference
| Held \ Requested | S | U | X | IS | IX | SIX |
|---|---|---|---|---|---|---|
| S | ✓ | ✓ | ✗ | ✓ | ✗ | ✗ |
| U | ✓ | ✗ | ✗ | ✓ | ✗ | ✗ |
| X | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| IS | ✓ | ✓ | ✗ | ✓ | ✓ | ✗ |
| IX | ✗ | ✗ | ✗ | ✓ | ✓ | ✗ |
| SIX | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ |
Version-Aware Check Suppression
If the SQL Server version is known — from the ServerVersion attribute in the plan XML or stated by the user — read VERSION_COMPATIBILITY.md (~/.claude/skills/VERSION_COMPATIBILITY.md if installed, or skills/VERSION_COMPATIBILITY.md from the repo). If unavailable, skip silently. For checks whose minimum version exceeds the instance version: verbose mode → log as SKIP (version: requires SQL 20XX+, instance is SQL 20YY); standard report → omit entirely. Do not suppress NOT ASSESSED rows from missing input — only suppress version-inapplicable checks.
Output Format
## Deadlock Analysis
### Deadlock Summary
| | Victim | Winner |
|--|--------|--------|
| **SPID** | X | Y |
| **Host** | [hostname] | [hostname] |
| **Procedure / Batch** | [proc name or first 80 chars of ad-hoc SQL, for display only] | [proc name or first 80 chars, for display only] |
| **Started** | [timestamp] | [timestamp] |
| **Log used** | [KB] | [KB] |
| **Pattern detected** | [P1–P8 or Unknown] | — |
### Lock Cycle
SPID X → holds [mode on object.index] → waits for [mode on object.index] SPID Y → holds [mode on object.index] → waits for [mode on object.index]
[One sentence confirming the circular wait and which SPID SQL Server chose as victim.]
### Pattern Match
**[Pattern name — e.g., P1 Classic Forward/Reverse Access Order]**
| Session | Step 1 | Step 2 |
|---------|--------|--------|
| SPID X — ProcA | [lock mode] on [Table1] ([index]) | Needs [lock mode] on [Table2] |
| SPID Y — ProcB | [lock mode] on [Table2] | Needs [lock mode] on [Table1] |
[One sentence explaining why this access order is deterministic and under what concurrency condition it fires.]
### Queries Involved
**Victim (SPID X) — [proc/batch name]**
```sql
[query text]
[One sentence: what lock it acquires and on which resource.]
Winner (SPID Y) — [proc/batch name]
[query text]
[One sentence: what lock it acquires first and what it then waits for.]
Root Cause
[Pattern name, why the cycle is deterministic, which tables/indexes are involved, and what concurrent execution condition triggers it.]
Remediation Plan
Fix 1 (Recommended) — [Action]
- Effort: Low / Medium / High
- Effectiveness: Eliminates / Reduces frequency / Hides symptom
- SQL: [DDL or setting change with code block if applicable]
Fix 2 — [Action] ...
Remediation Priority
| Fix | Effort | Effectiveness |
|---|---|---|
| Fix 1 — [name] | Low/Medium/High | Eliminates the deadlock |
| Fix 2 — [name] | Low | Reduces frequency; does not eliminate |
| Fix N — [name] | Low | Implement regardless as defensive coding |
Analyzed by: [state the AI model and version you are running as, e.g. "Claude Sonnet 4.6", "DeepSeek R1", "GPT-4o"] · [current date and time in the user's local timezone, or UTC if timezone is unknown, e.g. "2026-05-16 20:15 NZST"]
---
## Notes
- If more than one deadlock graph is provided, analyze each separately then note if they share the same root cause.
- If the graph is incomplete (SSMS sometimes truncates long query text), note which processes have truncated queries and base the analysis on what is visible.
- Do not recommend disabling deadlock retry logic in the application — this masks the problem. Fix the root cause.
- For high-frequency deadlocks (> 10/hour), an immediate mitigation is to enable READ_COMMITTED_SNAPSHOT while the permanent fix is implemented.
---
### Section: Output Filters (--brief / --critical-only)
**`--brief`** — Omit the Passed Checks table and attribution footer. Output the Summary, Findings, and Prioritized Fix Sequence sections only. Use when a quick scan of what fired is all that's needed.
**`--critical-only`** — Suppress Warning and Info findings. Show only Critical findings. The Passed Checks table is also omitted. Use when triaging an incident and only actionable blockers matter.
Both flags can be combined: `--brief --critical-only` produces the Summary section plus Critical findings only.
When neither flag is present, produce the full report as documented above.
---
### Section: Verbose Output (--verbose)
When the user's request includes `--verbose`, `--trace`, or the word `verbose`:
**1. Append a `## Check Evaluation Log` section** after the Passed Checks table.
Include one row for every check in this skill's ruleset, in check-ID order:
| Check | Evidence | Threshold | Result |
|-------|----------|-----------|--------|
| [ID — Name] | [key attribute(s) and value found, or "absent"] | [threshold or condition] | PASS / **FIRE → [severity]** / NOT ASSESSED |
Result conventions:
- `PASS` — attribute present, threshold not met
- `**FIRE → Critical/Warning/Info**` — threshold met; bold to distinguish from passes
- `NOT ASSESSED` — required attribute absent from input
**2. Save both files** to the current working directory using the Write tool:
output/<skill-name>/<YYYY-MM-DD-HHmmss>-<input-prefix>/analysis.md ← full report
output/<skill-name>/<YYYY-MM-DD-HHmmss>-<input-prefix>/trace.md ← Check Evaluation Log
Derive `<input-prefix>`:
1. Filename stem if a file path was provided (e.g. `horrible.sqlplan` → `horrible`)
2. First meaningful identifier from the artifact (top wait type, first table name, procedure name, etc.)
3. Fallback: `run`
Sanitize: alphanumeric + hyphens/underscores only, max 32 chars.
File headers:
analysis.md → `# Analysis — <skill-name> / # Input: <first 80 chars> / # Generated: <UTC timestamp>`
trace.md → `# Check Evaluation Log — <skill-name> / # Input: <first 80 chars> / # Generated: <UTC timestamp>`
Create directories as needed. When `--verbose` is not present, write nothing to disk.
---
## Companion Skills
- **sqlplan-review** — Analyze the execution plans of the deadlocked queries to identify missing indexes that extend lock hold time.
- **sqlindex-advisor** — Generate index DDL to eliminate the missing index patterns (P4, P5) that cause page-level lock escalation and bookmark lookup deadlocks.
- **tsql-review** — Review the T-SQL source of the deadlocked procedures for lock-order inconsistencies and missing TRY/CATCH (T19, T20).
- **sqltrace-review** — If the trace includes Lock:Timeout (X6) or repeated deadlock events, use sqltrace-review to quantify frequency and identify peak periods.
- **sqlquerystore-review** — Analyze Query Store data to find regressed queries, plan instability, and the top resource consumers across the whole workload. Use after running a workload capture to prioritize which queries to tune with /sqlplan-review.
- **mssql-performance-review** — Orchestrator that routes mixed artifacts to multiple specialised skills (this one included), runs an adversarial root-cause check, and produces a single consolidated report with evidence chain, risk-rated fixes, and rollback. Use when you have several artifact types together or describe a symptom without knowing which skill to run.