sqldeadlock-review

star 1

Analyze SQL Server deadlock XML (from system_health XE session, SSMS deadlock graph, or trace) to identify root cause and produce a prioritized remediation plan. Applies 16 known deadlock patterns (P1–P16). Use when a deadlock monitor captures a graph or users report intermittent deadlock errors (error 1205).

vanterx By vanterx schedule Updated 6/5/2026

name: sqldeadlock-review description: Analyze SQL Server deadlock XML (from system_health XE session, SSMS deadlock graph, or trace) to identify root cause and produce a prioritized remediation plan. Applies 16 known deadlock patterns (P1–P16). Use when a deadlock monitor captures a graph or users report intermittent deadlock errors (error 1205). triggers: - /sqldeadlock-review - /deadlock - /deadlock-analyze

SQL Server Deadlock Analysis Skill

Purpose

Parse a SQL Server deadlock XML graph, identify the victim and winner processes, extract the queries and lock acquisition patterns involved, match against 16 known deadlock patterns (P1–P16), and produce a prioritized remediation plan.

Input

Accept any of:

  • Raw <deadlock> XML (from system_health XE session or SSMS deadlock graph Save As XML)
  • A file path to a .xdl or .xml deadlock graph file
  • A description of the deadlock if XML is not available

How to Run

  1. Parse the XML structure
  2. Extract process list (victim, winner, their queries, lock waits)
  3. Extract resource list (what locks are held and requested)
  4. Match against pattern library
  5. Generate remediation recommendations

XML Structure Reference

<deadlock>
  <victim-list>
    <victimProcess id="process1a2b" />
  </victim-list>
  <process-list>
    <process id="process1a2b" taskpriority="0" logused="0"
             waitresource="KEY: 5:72057594038910976 (abc123)"
             waittime="4023" ownerId="123456"
             transactionname="user_transaction"
             currentdb="5" spid="52" kpid="1234"
             status="suspended" isolationlevel="read committed">
      <executionStack>
        <frame procname="adhoc" line="3" stmtstart="100" stmtend="200"
               sqlhandle="0x...">
          UPDATE Orders SET Status = 1 WHERE Id = @id
        </frame>
      </executionStack>
      <inputbuf>UPDATE Orders SET Status = 1 WHERE Id = @id</inputbuf>
    </process>
    ...
  </process-list>
  <resource-list>
    <keylock hobtid="72057594038910976" dbid="5" objectname="dbo.Orders"
             indexname="PK_Orders" id="lock1" mode="X" associatedObjectId="...">
      <owner-list>
        <owner id="process2c3d" mode="X" />
      </owner-list>
      <waiter-list>
        <waiter id="process1a2b" mode="U" requestType="wait" />
      </waiter-list>
    </keylock>
    ...
  </resource-list>
</deadlock>

Extraction Checklist

For each process:

  • Process ID, SPID, victim status (yes/no)
  • Query text (from <inputbuf> and <executionStack>)
  • waitresource — what lock it is waiting for
  • transactionname — the transaction context
  • isolationlevel — READ COMMITTED, SNAPSHOT, SERIALIZABLE, etc.
  • logused — how much log has been written (indicator of transaction size)

For each resource:

  • Resource type: keylock, pagelock, objectlock, ridlock, metadatalock
  • Object and index name
  • Mode held by each owner (S, U, X, IS, IX, SIX)
  • Mode requested by each waiter

Pattern Library (P1–P16)

P1 — Classic Forward/Reverse Access Order

  • Signature: Process A holds X on resource R1, waits for resource R2. Process B holds X on R2, waits for R1.
  • Severity: High
  • Cause: Two transactions update the same pair of rows in opposite order.
  • Fix: Enforce a consistent access order in application code (always update table A before table B, always process rows in ascending PK order).

P2 — Reader/Writer Deadlock (Shared vs Exclusive)

  • Signature: Process A holds S lock (SELECT), waits for X. Process B holds X (UPDATE), waits for S to be released.
  • Severity: High
  • Cause: A long-running read transaction blocks a writer; another reader prevents the writer from completing, causing a cycle.
  • Fix: Enable READ_COMMITTED_SNAPSHOT isolation (ALTER DATABASE ... SET READ_COMMITTED_SNAPSHOT ON). Readers take no shared locks under RCSI — the most common fix for reader/writer deadlocks without changing application code.

P3 — Update Lock Escalation Deadlock

  • Signature: Multiple processes hold U locks on different rows, each waiting for U on the other's row.
  • Severity: High
  • Cause: UPDATE statements taking U locks in different orders on the same table.
  • Fix: Add an index on the WHERE clause columns so each update targets exactly one row (reduces lock scope). Consider using WITH (ROWLOCK) hint. Consistent access order also applies.

P4 — Missing Index Causing Page Lock Escalation

  • Signature: objectlock or pagelock resource type (not keylock) in the resource list.
  • Severity: High
  • Cause: Without a row-level index, SQL Server takes page or table locks. Multiple transactions competing for the same page deadlock each other.
  • Fix: Add a nonclustered index on the filter column so SQL Server takes row-level (keylock) locks instead of page locks. Use the sqlindex-advisor skill if an execution plan is available.

P5 — Bookmark Lookup Deadlock (Key Lookup)

  • Signature: Two keylock resources: one on a nonclustered index, one on the clustered index (PK). Process A holds lock on NC index, waits for PK. Process B holds lock on PK, waits for NC index.
  • Severity: Medium
  • Cause: A query does a Key Lookup (NC index → PK), taking locks on both. Another query updates via the PK, taking locks in reverse order.
  • Fix: Eliminate the Key Lookup by adding INCLUDE columns to the NC index so no bookmark lookup is needed. This removes the two-resource lock acquisition.

P6 — SERIALIZABLE Phantom Deadlock

  • Signature: isolationlevel = serializable on one or more processes AND range locks (RangeX-X, RangeS-U) visible in the resource type.
  • Severity: Medium
  • Cause: SERIALIZABLE isolation holds range locks to prevent phantoms. Two transactions holding range locks on adjacent ranges block each other's inserts.
  • Fix: Downgrade to SNAPSHOT isolation if application semantics allow. SNAPSHOT uses optimistic row versioning and eliminates range locks entirely. If SERIALIZABLE is required, reduce transaction scope.

P7 — Foreign Key Check Deadlock

  • Trigger: objectname in the resource list references a parent table, and one process is inserting into the child table while another deletes from the parent.
  • Severity: Medium
  • Cause: Inserting into a child table takes a shared lock on the parent (FK validation). Deleting from the parent takes an exclusive lock. If done concurrently in opposite order, deadlock occurs.
  • Fix: Add an index on the FK column in the child table (prevents table scan during FK validation). Ensure parent deletes and child inserts do not overlap in concurrent transactions.

P8 — Self-Deadlock (Single Process)

  • Signature: victim-list and process-list contain only one process ID.
  • Severity: Medium
  • Cause: A single SPID is requesting a lock it already holds in an incompatible mode (rare, usually triggered by cursors or certain MERGE statements).
  • Fix: Rewrite the query to avoid cursor-based row-by-row processing. Review MERGE statements for self-deadlock edge cases documented in KB articles.

P9 — RCSI Reader Deadlock Despite RCSI Enabled

  • Signature: READ_COMMITTED_SNAPSHOT is ON for the database yet the deadlock involves a reader (S lock) and a writer (X lock) in a cycle.
  • Severity: High
  • Cause: One or more sessions is using an isolation level that still takes shared locks despite RCSI — REPEATABLE READ, SERIALIZABLE, or an explicit WITH (HOLDLOCK) / WITH (UPDLOCK) hint overrides RCSI for that statement. RCSI only removes S locks for READ COMMITTED; higher isolation levels retain them.
  • Fix: Identify the isolation level of the reader process (isolationlevel attribute). If REPEATABLE READ or SERIALIZABLE is not required by the application, downgrade to READ COMMITTED. Remove unnecessary WITH (HOLDLOCK) hints.

P10 — MERGE Statement Deadlock

  • Signature: One or more processes has a MERGE statement in its <executionStack> as the active frame.
  • Severity: High
  • Cause: MERGE uses Halloween Protection — it must read all matching rows before writing any, which creates an internal spool. This spool acquires a mix of S and X locks in a pattern that can cycle with concurrent MERGE or DML on the same table.
  • Fix: Replace MERGE with explicit INSERT/UPDATE/DELETE statements. If MERGE is required, add WITH (TABLOCK) on the target as a short-term workaround (serializes all MERGE operations). Longer term, partition the target table or pre-stage source data to reduce concurrent overlap.

P11 — Heap Table RID Lock Deadlock

  • Signature: Resource list contains ridlock entries instead of keylock entries.
  • Severity: High
  • Cause: The table has no clustered index (heap). SQL Server uses Row ID (RID) locks on heap pages. RID locks are coarser than key locks and more prone to conflicts because multiple rows share a page granularity boundary.
  • Fix: Add a clustered index to convert the heap to a B-tree table. Row-level key locks replace RID locks, reducing deadlock surface. Use /sqlindex-advisor to identify the best clustering key.

P12 — Distributed Transaction Deadlock

  • Signature: One or more processes has transactionname containing "Distributed Transaction" or dtcState attribute present.
  • Severity: High
  • Cause: MS DTC coordinates a distributed transaction spanning multiple SQL Server instances or resource managers. Distributed transactions hold locks for the full duration of the two-phase commit protocol, which is significantly longer than local transactions. The extended lock hold time dramatically increases deadlock probability.
  • Fix: Eliminate distributed transactions where possible by co-locating data on a single instance. If DTC is required, minimize transaction scope — commit or roll back as quickly as possible. Ensure DTC timeout settings are not artificially extended.

P13 — Multiple Deadlock Graphs: Shared Root Cause

  • Signature: Multiple deadlock XML graphs are provided and all share the same objectname or indexname in their resource lists.
  • Severity: High
  • Cause: A single table or index is the hotspot for all deadlocks. This is not a set of independent incidents — it is one recurring access pattern that fires repeatedly under concurrency.
  • Fix: Focus exclusively on the shared table. Apply the fix pattern appropriate for the individual deadlock type (P1–P12) detected in each graph. A single index addition or isolation level change to the shared table will resolve all graphs simultaneously.

P14 — TempDB Resource Deadlock

  • Signature: objectname in the resource list is a tempdb object (tempdb.dbo.#temp or a system page like GAM, PFS, SGAM).
  • Severity: High
  • Cause: Concurrent DDL on temp tables (CREATE/DROP) under high parallelism contends on TempDB allocation pages. Multiple sessions creating and dropping temp objects simultaneously fight over PFS and GAM pages. User temp table DML can also deadlock when two sessions update the same temp table rows.
  • Fix: For allocation page contention, increase TempDB file count to match CPU count (up to 8 files), enable TF 1118 (SQL 2014–), or configure TempDB appropriately for the SQL Server version. For user temp table deadlocks, apply the same lock-order fixes as P1.

P15 — Lock Escalation Deadlock

  • Signature: Resource list contains an objectlock (table-level lock) alongside keylock or ridlock entries on the same table, held by different sessions.
  • Severity: High
  • Cause: SQL Server escalated row locks to a table lock for one session (threshold: 5,000 locks per table or per transaction). The table lock conflicts with row-level locks held by another concurrent session.
  • Fix: Prevent escalation with ALTER TABLE ... SET (LOCK_ESCALATION = DISABLE) if read isolation allows. Alternatively, reduce transaction size so the 5,000-lock threshold is never reached. Enable RCSI to eliminate S locks from readers, reducing total lock count.

P16 — Ledger or Temporal History Table Deadlock

  • Signature: objectname in the resource list references a ledger history table (named MSSQL_LedgerHistoryFor_...) or a temporal history table. Applies to SQL 2016+ (temporal) and SQL 2022+ (ledger).
  • Severity: Medium
  • Cause: Ledger and temporal tables maintain hidden history rows on UPDATE/DELETE. The system inserts history rows into a separate table, acquiring locks in an order that can conflict with explicit DML on the base table from another session.
  • Fix: Ensure application code does not hold locks on the base table for extended periods (keep transactions short). If possible, route read queries against the history table to a secondary replica to reduce read/write contention on the primary.

Lock Compatibility Reference

Held \ Requested S U X IS IX SIX
S
U
X
IS
IX
SIX

Version-Aware Check Suppression

If the SQL Server version is known — from the ServerVersion attribute in the plan XML or stated by the user — read VERSION_COMPATIBILITY.md (~/.claude/skills/VERSION_COMPATIBILITY.md if installed, or skills/VERSION_COMPATIBILITY.md from the repo). If unavailable, skip silently. For checks whose minimum version exceeds the instance version: verbose mode → log as SKIP (version: requires SQL 20XX+, instance is SQL 20YY); standard report → omit entirely. Do not suppress NOT ASSESSED rows from missing input — only suppress version-inapplicable checks.


Output Format

## Deadlock Analysis

### Deadlock Summary

| | Victim | Winner |
|--|--------|--------|
| **SPID** | X | Y |
| **Host** | [hostname] | [hostname] |
| **Procedure / Batch** | [proc name or first 80 chars of ad-hoc SQL, for display only] | [proc name or first 80 chars, for display only] |
| **Started** | [timestamp] | [timestamp] |
| **Log used** | [KB] | [KB] |
| **Pattern detected** | [P1–P8 or Unknown] | — |

### Lock Cycle

SPID X → holds [mode on object.index] → waits for [mode on object.index] SPID Y → holds [mode on object.index] → waits for [mode on object.index]


[One sentence confirming the circular wait and which SPID SQL Server chose as victim.]

### Pattern Match

**[Pattern name — e.g., P1 Classic Forward/Reverse Access Order]**

| Session | Step 1 | Step 2 |
|---------|--------|--------|
| SPID X — ProcA | [lock mode] on [Table1] ([index]) | Needs [lock mode] on [Table2] |
| SPID Y — ProcB | [lock mode] on [Table2] | Needs [lock mode] on [Table1] |

[One sentence explaining why this access order is deterministic and under what concurrency condition it fires.]

### Queries Involved

**Victim (SPID X) — [proc/batch name]**
```sql
[query text]

[One sentence: what lock it acquires and on which resource.]

Winner (SPID Y) — [proc/batch name]

[query text]

[One sentence: what lock it acquires first and what it then waits for.]

Root Cause

[Pattern name, why the cycle is deterministic, which tables/indexes are involved, and what concurrent execution condition triggers it.]

Remediation Plan

Fix 1 (Recommended) — [Action]

  • Effort: Low / Medium / High
  • Effectiveness: Eliminates / Reduces frequency / Hides symptom
  • SQL: [DDL or setting change with code block if applicable]

Fix 2 — [Action] ...

Remediation Priority

Fix Effort Effectiveness
Fix 1 — [name] Low/Medium/High Eliminates the deadlock
Fix 2 — [name] Low Reduces frequency; does not eliminate
Fix N — [name] Low Implement regardless as defensive coding

Analyzed by: [state the AI model and version you are running as, e.g. "Claude Sonnet 4.6", "DeepSeek R1", "GPT-4o"] · [current date and time in the user's local timezone, or UTC if timezone is unknown, e.g. "2026-05-16 20:15 NZST"]


---

## Notes

- If more than one deadlock graph is provided, analyze each separately then note if they share the same root cause.
- If the graph is incomplete (SSMS sometimes truncates long query text), note which processes have truncated queries and base the analysis on what is visible.
- Do not recommend disabling deadlock retry logic in the application — this masks the problem. Fix the root cause.
- For high-frequency deadlocks (> 10/hour), an immediate mitigation is to enable READ_COMMITTED_SNAPSHOT while the permanent fix is implemented.

---

### Section: Output Filters (--brief / --critical-only)

**`--brief`** — Omit the Passed Checks table and attribution footer. Output the Summary, Findings, and Prioritized Fix Sequence sections only. Use when a quick scan of what fired is all that's needed.

**`--critical-only`** — Suppress Warning and Info findings. Show only Critical findings. The Passed Checks table is also omitted. Use when triaging an incident and only actionable blockers matter.

Both flags can be combined: `--brief --critical-only` produces the Summary section plus Critical findings only.

When neither flag is present, produce the full report as documented above.

---

### Section: Verbose Output (--verbose)

When the user's request includes `--verbose`, `--trace`, or the word `verbose`:

**1. Append a `## Check Evaluation Log` section** after the Passed Checks table.

Include one row for every check in this skill's ruleset, in check-ID order:

| Check | Evidence | Threshold | Result |
|-------|----------|-----------|--------|
| [ID — Name] | [key attribute(s) and value found, or "absent"] | [threshold or condition] | PASS / **FIRE → [severity]** / NOT ASSESSED |

Result conventions:
- `PASS` — attribute present, threshold not met
- `**FIRE → Critical/Warning/Info**` — threshold met; bold to distinguish from passes
- `NOT ASSESSED` — required attribute absent from input

**2. Save both files** to the current working directory using the Write tool:

  output/<skill-name>/<YYYY-MM-DD-HHmmss>-<input-prefix>/analysis.md  ← full report
  output/<skill-name>/<YYYY-MM-DD-HHmmss>-<input-prefix>/trace.md     ← Check Evaluation Log

Derive `<input-prefix>`:
1. Filename stem if a file path was provided (e.g. `horrible.sqlplan` → `horrible`)
2. First meaningful identifier from the artifact (top wait type, first table name, procedure name, etc.)
3. Fallback: `run`
Sanitize: alphanumeric + hyphens/underscores only, max 32 chars.

File headers:
  analysis.md → `# Analysis — <skill-name> / # Input: <first 80 chars> / # Generated: <UTC timestamp>`
  trace.md    → `# Check Evaluation Log — <skill-name> / # Input: <first 80 chars> / # Generated: <UTC timestamp>`

Create directories as needed. When `--verbose` is not present, write nothing to disk.

---

## Companion Skills

- **sqlplan-review** — Analyze the execution plans of the deadlocked queries to identify missing indexes that extend lock hold time.
- **sqlindex-advisor** — Generate index DDL to eliminate the missing index patterns (P4, P5) that cause page-level lock escalation and bookmark lookup deadlocks.
- **tsql-review** — Review the T-SQL source of the deadlocked procedures for lock-order inconsistencies and missing TRY/CATCH (T19, T20).
- **sqltrace-review** — If the trace includes Lock:Timeout (X6) or repeated deadlock events, use sqltrace-review to quantify frequency and identify peak periods.
- **sqlquerystore-review** — Analyze Query Store data to find regressed queries, plan instability, and the top resource consumers across the whole workload. Use after running a workload capture to prioritize which queries to tune with /sqlplan-review.

- **mssql-performance-review** — Orchestrator that routes mixed artifacts to multiple specialised skills (this one included), runs an adversarial root-cause check, and produces a single consolidated report with evidence chain, risk-rated fixes, and rollback. Use when you have several artifact types together or describe a symptom without knowing which skill to run.
Install via CLI
npx skills add https://github.com/vanterx/mssql-performance-skills --skill sqldeadlock-review
Repository Details
star Stars 1
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator