sqlmemory-review - SKILL.md Agent Skill

name: sqlmemory-review description: Analyze SQL Server memory pressure using buffer pool metrics, plan cache composition, memory grants, and memory clerk data. Applies 20 checks (O1–O20) covering Page Life Expectancy degradation, single-use plan cache bloat, RESOURCE_SEMAPHORE queue depth, memory grant timeouts, buffer pool concentration, ColumnStore and In-Memory OLTP footprint, OS memory pressure notifications, and server memory configuration. Use this skill when the server is paging, queries queue for memory grants, or PLE is low and dropping. Trigger when pasting output from sys.dm_os_memory_clerks, sys.dm_os_ring_buffers, sys.dm_exec_query_memory_grants, or PLE perf counters. triggers: - /sqlmemory-review - /memory-review - /ple-check

SQL Server Memory Review Skill

Purpose

Analyze SQL Server memory state and identify the root cause of memory pressure. Applies 20 checks (O1–O20) across four categories:

O1–O5 — Buffer pool and Page Life Expectancy: detect low PLE, declining trends, NUMA node imbalance, and buffer pool concentration in a single database
O6–O10 — Plan cache: single-use plan bloat, excessive compile counts, large individual plans, and high plan cache churn
O11–O15 — Memory grants and RESOURCE_SEMAPHORE: detect grant queuing, grant timeouts, oversized grants, and Resource Governor misconfigurations
O16–O20 — Memory clerks, OS pressure, and configuration: ColumnStore/In-Memory OLTP memory footprint, OS pressure notifications, stolen (non-buffer) memory dominance, Lock Pages in Memory misconfiguration, and Max Server Memory not explicitly set

Input

Accept any of:

Output from sys.dm_os_memory_clerks capture query below (paste the result grid)
Output from sys.dm_os_ring_buffers WHERE ring_buffer_type = N'RING_BUFFER_RESOURCE_MONITOR' — memory pressure notifications (also accept RING_BUFFER_OOM records for out-of-memory events)
Output from sys.dm_exec_query_memory_grants for current grant queue state
PLE counter values from sys.dm_os_performance_counters or SSMS Activity Monitor
Output from sys.dm_os_sys_memory for OS-level memory state
Combined paste of two or more of the above; apply all applicable checks
A natural language description of symptoms ("PLE is 200 and dropping, RESOURCE_SEMAPHORE is 15% of waits")

Recommended capture queries

-- 1. Memory clerks — top consumers (paste top 20+ rows)
SELECT TOP 20
    type,
    name,
    memory_node_id,
    pages_kb,
    virtual_memory_reserved_kb,
    virtual_memory_committed_kb,
    awe_allocated_kb,
    shared_memory_reserved_kb,
    shared_memory_committed_kb
FROM sys.dm_os_memory_clerks
ORDER BY pages_kb DESC;

-- 2. Page Life Expectancy (PLE)
SELECT object_name, counter_name, instance_name, cntr_value
FROM sys.dm_os_performance_counters
WHERE object_name LIKE '%Buffer Manager%'
  AND counter_name = 'Page life expectancy';

-- 3. Plan cache single-use waste
SELECT
    SUM(CASE WHEN usecounts = 1 THEN size_in_bytes ELSE 0 END) / 1048576 AS single_use_mb,
    SUM(size_in_bytes) / 1048576                                            AS total_plan_cache_mb,
    COUNT(*)                                                                AS total_plans,
    SUM(CASE WHEN usecounts = 1 THEN 1 ELSE 0 END)                         AS single_use_plans
FROM sys.dm_exec_cached_plans
WHERE objtype IN ('Adhoc', 'Prepared');

-- 4. Memory grant queue (current requests waiting)
SELECT
    session_id,
    request_id,
    scheduler_id,
    grant_time,
    requested_memory_kb,
    granted_memory_kb,
    required_memory_kb,
    used_memory_kb,
    max_used_memory_kb,
    query_cost,
    timeout_sec,
    resource_semaphore_id,
    wait_order,
    is_next_candidate
FROM sys.dm_exec_query_memory_grants
ORDER BY wait_order;

-- 5. OS memory state
SELECT
    total_physical_memory_kb,
    available_physical_memory_kb,
    total_page_file_kb,
    available_page_file_kb,
    system_memory_state_desc
FROM sys.dm_os_sys_memory;

Thresholds Reference

Metric	Info	Warning	Critical
PLE (single NUMA node)	≥ 300 s	< 300 s	< 60 s
PLE (multi-NUMA, per node)	≥ 300 s	< 300 s	< 60 s
PLE decline rate (trend)	< 10 s/min	≥ 10 s/min	≥ 60 s/min
Single-use plan cache as % of total plan cache	< 30%	≥ 30%	≥ 60%
RESOURCE_SEMAPHORE wait (from sqlwait-review)	0 sessions	1–5 queued	> 5 queued
Memory grant timeout	0	Any	—
Stolen memory (non-buffer) as % of target	< 15%	≥ 15%	≥ 30%
Buffer pool: one DB as % of pool	< 60%	≥ 60%	≥ 80%
ColumnStore pool (COLUMNSTORE_OBJECT_POOL)	—	> 25%	> 50%
In-Memory OLTP (XTP) memory	—	> 25%	> 50%

Buffer Pool and PLE Checks (O1–O5)

Run these first to determine if SQL Server is under immediate memory pressure.

O1 — Low Page Life Expectancy

Trigger: Any cntr_value for Page life expectancy in sys.dm_os_performance_counters (object Buffer Manager or Buffer Node) < 300 seconds
Severity: Warning if < 300 s; Critical if < 60 s
Fix: PLE below 300 means pages cycle out of the buffer pool faster than expected (the traditional rule of thumb: 300 s per 4 GB of buffer pool, so a 128 GB server warrants PLE > ~9,600 s). Investigate: (1) run O4 to check if one database dominates the pool; (2) check if a scan-heavy query is flooding the pool — run /sqltrace-review to identify full-scan workloads; (3) check if total committed memory is near Max Server Memory — run O20 to verify the setting.

O2 — PLE Declining Trend

Trigger: Multiple PLE snapshots show a monotonically decreasing pattern over ≥ 3 data points, with a rate of decline ≥ 10 s per minute
Severity: Warning if 10–59 s/min decline; Critical if ≥ 60 s/min decline
Fix: A sustained PLE drop means the buffer pool is being churned. Identify the source: correlate PLE drop timestamps with batch execution times in /sqltrace-review or Query Store. Common causes: a newly deployed query causing a large table scan, a growing database whose working set no longer fits in memory, or a maintenance job (REBUILD, DBCC CHECKDB) running without WITH TABLOCK / WITH PHYSICAL_ONLY limits.

O3 — NUMA Node PLE Imbalance

Trigger: When multiple Buffer Node PLE counters are present (multi-NUMA system), any node's PLE is < 50% of the maximum node PLE
Severity: Warning
Fix: One NUMA node is under higher memory pressure than others. This suggests either: uneven memory configuration across NUMA nodes (check BIOS / hardware), or queries are bound to specific schedulers/CPUs that concentrate working sets. Check sys.dm_os_nodes for memory allocation per node. Consider soft-NUMA if physical NUMA is unavailable.

O4 — Buffer Pool Dominated by Single Database

Trigger: A single database holds ≥ 60% of all pages in the buffer pool (use sys.dm_os_buffer_descriptors grouped by database_id)
Severity: Warning if ≥ 60%; Critical if ≥ 80%
Fix: One database is evicting all others from the buffer pool. If this is the intended primary OLTP database, increase Max Server Memory (run O20). If it is unexpected (reporting DB, audit DB), consider: (1) partitioning workloads to separate SQL instances; (2) using Resource Governor with memory limits; (3) scheduling large scan queries during off-peak hours.

O5 — Stolen Memory Excessive

Trigger: Sum of all non-buffer-pool clerk pages_kb ≥ 15% of committed_target_kb from sys.dm_os_sys_info
Severity: Warning if 15–29%; Critical if ≥ 30%
Fix: Stolen (non-buffer) memory includes locks, SQL CLR, linked servers, plan cache, ColumnStore/XTP pools, and third-party extended procedures. Run O9 (locks clerk), O16 (ColumnStore), and O17 (XTP) checks. If CLR or linked servers are involved, review their configuration. Stolen memory leaves less room for the buffer pool, causing the PLE symptoms seen in O1.

Plan Cache Checks (O6–O10)

O6 — Single-Use Plan Cache Bloat

Trigger: Single-use plans represent ≥ 30% of total plan cache size OR total single-use plan count > 10,000
Severity: Warning if 30–59%; Critical if ≥ 60%
Fix: Ad-hoc queries that run once generate plans that consume memory and are never reused. Enable optimize for ad hoc workloads (sp_configure 'optimize for ad hoc workloads', 1) to store only a stub on first execution, saving the full plan allocation until the query runs a second time. Longer term, parameterize application queries or use sp_executesql with parameters.

O7 — High Plan Cache Compilation Rate

Trigger: SQL Compilations/sec or SQL Re-Compilations/sec from sys.dm_os_performance_counters > 500/sec
Severity: Warning if 500–999/sec; Critical if ≥ 1,000/sec
Fix: Frequent compilations burn CPU and fragment the plan cache. Causes: unparameterized ad-hoc SQL (fix: O6), use of WITH RECOMPILE hints on hot paths, DDL/statistics changes triggering recompiles, or SET option changes per connection. Run /tsql-review on the top offending query text to check for recompile hints.

O8 — Large Individual Plan in Cache

Trigger: Any single plan in sys.dm_exec_cached_plans has size_in_bytes > 10 MB
Severity: Warning if 10–49 MB; Critical if ≥ 50 MB
Fix: Large plans typically come from queries with many joins, OR-heavy predicates, or dynamic SQL generating enormous statement trees. A 50 MB plan indicates a query that is too complex to compile efficiently. Review the query for: missing indexes that force large merge trees, OR predicates that could be rewritten as UNION, or dynamically constructed WHERE clauses that generate cartesian predicate combinations.

O9 — Lock Memory Excessive

Trigger: OBJECTSTORE_LOCK_MANAGER clerk in sys.dm_os_memory_clerks has pages_kb > 100,000 KB (100 MB)
Severity: Warning if 100–499 MB; Critical if ≥ 500 MB
Fix: SQL Server allocates a lock structure per row/page/object lock held. Excessive lock memory means: (1) row-level locking on very large result sets — review transactions holding locks and add appropriate WHERE clauses; (2) lock escalation is disabled or failing — check sys.dm_tran_locks for escalation events; (3) long-running transactions holding many locks — pair with /sqlwait-review for LCK_ wait analysis.

O10 — Plan Cache Churn (Low Hit Rate)

Trigger: Plan cache hit rate (= Cache Hit Ratio counter from the Plan Cache performance object) < 90% over a sustained period
Severity: Warning if 80–89%; Critical if < 80%
Fix: A cache hit rate below 90% means SQL Server is recompiling more than 10% of queries. Pair with O6 (single-use bloat) and O7 (compile rate) to determine whether the cause is ad-hoc queries or forced recompiles. Enabling optimize for ad hoc workloads is the first mitigation; review for parameterization improvements next.

Memory Grant Checks (O11–O15)

O11 — Memory Grant Queue Depth

Trigger: sys.dm_exec_query_memory_grants has any row where grant_time IS NULL (query waiting for a grant) — i.e., wait_order > 0
Severity: Warning if 1–5 sessions waiting; Critical if > 5 sessions waiting
Fix: Queries are queuing for memory grants, meaning the server has insufficient memory for parallel sort, hash join, and hash aggregate operations. Immediate: identify the queries holding large grants (column granted_memory_kb) and consider OPTION (MAXDOP 1) or OPTION (MAX_GRANT_PERCENT = 10) as emergency mitigations. Long-term: add indexes to eliminate hash operations; increase Max Server Memory if available physical memory allows (O20).

O12 — Memory Grant Timeout

Trigger: Error 8645 ("A timeout occurred while waiting for memory resources to execute the query in the resource pool") OR error 701 ("There is insufficient system memory in resource pool 'default' to run this query") in the error log. Note: timeout_sec in sys.dm_exec_query_memory_grants is the configured time-out before the query gives up its grant request, not an indicator that a timeout occurred
Severity: Critical
Fix: Queries are failing due to memory exhaustion. Immediate actions: (1) kill the sessions holding the largest grants; (2) set Resource Governor minimum memory grant percent lower; (3) add OPTION (MIN_GRANT_PERCENT = 1) to the offending query. Root cause: the query's estimated row count is drastically wrong, causing an oversized grant estimate — run /sqlplan-review to check N21 (cardinality estimate accuracy) and /sqlstats-review for stale statistics.

O13 — Oversized Memory Grant

Trigger: Any single session has granted_memory_kb > 25% of max_memory_grant for the resource pool, OR max_used_memory_kb / granted_memory_kb < 0.25 (granted 4× more than actually used)
Severity: Warning
Fix: The query received a large grant but used a fraction of it, blocking other queries from getting grants (see O11). The grant overestimate almost always traces to stale or low-sample statistics on join input tables. Run /sqlplan-review on this query and look for N21 (row count estimate mismatch). Fix statistics with UPDATE STATISTICS ... WITH FULLSCAN.

O14 — Resource Governor Memory Pool Imbalance

Trigger: sys.dm_resource_governor_resource_pools shows a user pool with max_memory_percent = 100 or min_memory_percent = 0 when multiple pools are in use
Severity: Warning
Fix: Resource Governor is configured but workload pools are not memory-bounded. Set min_memory_percent and max_memory_percent on high-priority pools (OLTP) to protect them from being crowded out by reporting workloads. Use ALTER RESOURCE POOL + ALTER RESOURCE GOVERNOR RECONFIGURE.

O15 — Buffer Pool Extension (BPE) Active on SSDs

Trigger: sys.dm_os_buffer_pool_extension_configuration shows state = 5 (BUFFER POOL EXTENSION ENABLED)
Severity: Info
Fix: Buffer Pool Extension is enabled. Verify the BPE file is on a fast NVMe/SSD — placing it on a spinner negates any benefit and can worsen latency. Note: BPE is available in Enterprise and Standard editions only (introduced SQL Server 2014); if the workload outgrows it, additional physical RAM outperforms BPE.

Memory Clerk, OS Pressure, and Configuration Checks (O16–O20)

O16 — ColumnStore Buffer Pool Pressure

Trigger: CACHESTORE_COLUMNSTOREOBJECTPOOL clerk in sys.dm_os_memory_clerks has pages_kb > 25% of committed_target_kb
Severity: Warning if 25–49%; Critical if ≥ 50%
Fix: Columnstore index structures (column segments and dictionaries cached for query execution) are consuming a large share of SQL Server memory. This is expected on a ColumnStore-heavy workload but reduces buffer pool space. Mitigations: increase total RAM; if the server runs both OLTP and ColumnStore workloads, separate them with Resource Governor; ensure ColumnStore indexes only exist on tables that benefit from them — run /tsql-review to check for unnecessary column store usage (T-series checks for ColumnStore on low-cardinality tables).

O17 — In-Memory OLTP (XTP) Memory Pressure

Trigger: The MEMORYCLERK_XTP clerk has pages_kb > 25% of committed_target_kb
Severity: Warning if 25–49%; Critical if ≥ 50%
Fix: In-Memory OLTP memory is unbounded by default — bind the database to a Resource Governor resource pool with MIN_MEMORY_PERCENT/MAX_MEMORY_PERCENT set, using sp_xtp_bind_db_resource_pool (database must go offline/online for the binding to take effect). If XTP memory is growing, check: sys.dm_db_xtp_memory_consumers for the largest consumers; whether durable in-memory tables have an appropriate checkpoint file footprint; whether non-durable (SCHEMA_ONLY) in-memory tables are accumulating rows that are never cleaned up.

O18 — OS Memory Pressure Notifications

Trigger: sys.dm_os_ring_buffers WHERE ring_buffer_type = N'RING_BUFFER_RESOURCE_MONITOR' contains records with RESOURCE_MEMPHYSICAL_LOW or RESOURCE_MEMVIRTUAL_LOW notifications in the last 24 hours
Severity: Warning for RESOURCE_MEMPHYSICAL_LOW; Critical for RESOURCE_MEMVIRTUAL_LOW
Fix: Windows is signaling SQL Server that physical or virtual memory is critically low. SQL Server will shrink its buffer pool in response, causing PLE drops (O1). Root causes: (1) Max Server Memory set too high — another process on the server needs memory; (2) physical RAM is genuinely insufficient for the workload; (3) memory leak in a third-party component. Review sys.dm_os_memory_clerks for unexpected non-buffer growth. Set Max Server Memory to leave at least 4–10% or 4 GB (whichever is larger) for the OS.

O19 — Lock Pages in Memory (LPIM) Misconfiguration

Trigger: Server RAM > 32 GB AND sys.dm_os_sys_info.sql_memory_model_desc = 'CONVENTIONAL' (LPIM not enabled on a large-memory server)
Severity: Warning
Fix: Without Lock Pages in Memory, Windows can page out the SQL Server buffer pool to disk even when sys.dm_os_sys_memory shows available memory. On servers with > 32 GB RAM running under memory pressure, enable LPIM via Group Policy (Lock Pages in Memory user right for the SQL Server service account). LPIM prevents OS paging but does not protect against OOM — Max Server Memory must still be set correctly (O20).

O20 — Max Server Memory Not Explicitly Set

Trigger: sys.configurations WHERE name = 'max server memory (MB)' has value_in_use = 2147483647 (the default unlimited value)
Severity: Critical
Fix: Without an explicit Max Server Memory limit, SQL Server can consume all RAM, starving the OS, other services, and connection overhead. Set Max Server Memory using the formula: Total RAM − OS overhead (4–10%) − SSAS/SSRS/SSIS if co-located − 1 GB per 4 GB RAM for non-buffer memory. Use sp_configure 'max server memory (MB)', <calculated value>; RECONFIGURE. Then verify LPIM is enabled (O19) on large-memory servers.

Output Format

Present findings in this order:

Memory Pressure Summary — one sentence: is the server memory-pressured (PLE, grants queuing, OS notifications)?
Findings table — one row per triggered check:

Check	Severity	Metric	Finding	Fix
O1	Critical	PLE = 45 s	Well below 300 s threshold; dropping 3 s/min	Increase Max Server Memory or identify scan-heavy queries

Root cause hypothesis — what is most likely causing the pressure? List top 1–3 ranked by evidence weight.
Recommended next steps — ordered action list with companion skill references.

Analyzed by: sqlmemory-review (O1–O20)

Companion Skills

/sqlwait-review — RESOURCE_SEMAPHORE (V11), CMEMTHREAD (V12), and PAGEIOLATCH (V1–V3) waits are the live signal of memory pressure that this skill explains at the configuration level
/sqltrace-review — identify which workload queries are flooding the buffer pool or acquiring large memory grants
/sqlplan-review — N21 (row count estimate mismatch) and N15/N16 (sort/hash spills) explain why grants are oversized
/sqlquerystore-review — Q23 (memory grant regression) and Q24 (spill regression) link plan changes to memory pressure events
/sqldiskio-review — if PLE is low and sys.dm_io_virtual_file_stats shows high latency, the working set is spilling to disk; disk I/O is the symptom of memory pressure

VERSION_COMPATIBILITY

See skills/VERSION_COMPATIBILITY.md for the full compatibility matrix.

Check	2008 R2	2012	2014	2016	2017	2019	2022	Azure SQL
O1 PLE	✓	✓	✓	✓	✓	✓	✓	✓
O2 PLE Trend	✓	✓	✓	✓	✓	✓	✓	✓
O3 NUMA PLE	—	✓	✓	✓	✓	✓	✓	—
O4 DB concentration	✓	✓	✓	✓	✓	✓	✓	✓
O5 Stolen memory	✓	✓	✓	✓	✓	✓	✓	Partial
O6 Single-use plan	✓	✓	✓	✓	✓	✓	✓	✓
O7 Compile rate	✓	✓	✓	✓	✓	✓	✓	✓
O8 Large plan	✓	✓	✓	✓	✓	✓	✓	✓
O9 Lock clerk	✓	✓	✓	✓	✓	✓	✓	✓
O10 Plan hit rate	✓	✓	✓	✓	✓	✓	✓	✓
O11 Grant queue	✓	✓	✓	✓	✓	✓	✓	✓
O12 Grant timeout	✓	✓	✓	✓	✓	✓	✓	✓
O13 Oversized grant	✓	✓	✓	✓	✓	✓	✓	✓
O14 RG pool	✓	✓	✓	✓	✓	✓	✓	✓
O15 BPE	—	—	✓	✓	✓	✓	✓	—
O16 ColumnStore	—	—	✓	✓	✓	✓	✓	✓
O17 XTP/In-Memory	—	—	✓	✓	✓	✓	✓	✓
O18 OS pressure	✓	✓	✓	✓	✓	✓	✓	—
O19 LPIM	✓	✓	✓	✓	✓	✓	✓	N/A
O20 Max Server Mem	✓	✓	✓	✓	✓	✓	✓	N/A