performance-optimization - SKILL.md Agent Skill

name: performance-optimization description: "Applies measure-first performance optimization: profiles to find hot spots, applies algorithm and data-structure improvements before micro-optimizations, and validates each change prevents regression." disable-model-invocation: true

Skill: performance-optimization

STOP - Measure First (MANDATORY GATE)

Do not optimize based on intuition -- profile first.

Correctness before speed -- make it work, then make it fast
<4% of code causes >50% of runtime (Knuth 1971) -- find the hot spot before touching anything
>50% of optimizations produce negligible or negative results -- measurement prevents wasted effort

No measurement = no optimization. This gate is non-negotiable.

Scope Limitations

This skill covers single-threaded, single-process code tuning for general-purpose computing.

Not covered (need specialized guidance):

Concurrency: Lock contention often dominates; profile thread states, not just CPU
Distributed systems: Network latency ~10,000x memory; optimize RPC/serialization first
Real-time systems: Need worst-case latency, not average; caching adds variance
Embedded/constrained: Memory/power budgets require different tradeoffs

The Simplicity-Performance Relationship

Simpler code usually runs faster. Fewer special cases = less code to check; deep modules = more work per call with fewer layer crossings; complicated code does extraneous or redundant work.

Primary Workflow: 7-Step Decision Tree

Each step is a gate. Do NOT skip steps.

1. Is the program correct and complete?
   NO  -> Make it correct first. STOP optimization.
   YES -> Continue

2. Have you measured to find the actual bottleneck?
   NO  -> Profile/measure first. Do NOT guess.
   YES -> Continue

3. Can requirements be relaxed?
   YES -> Relax requirements. Done.
   NO  -> Continue

4. Can design/architecture solve it? (Stage 2: Fundamental Fixes)
   YES -> Fix design. Done.
   NO  -> Continue

5. Can algorithm/data structure solve it?
   YES -> Change algorithm. Done.
   NO  -> Continue

6. Can compiler flags help? (40-59% improvement possible)
   YES -> Enable optimizations. Measure.
   NO  -> Continue

7. Is it in the <4% that causes >50% of runtime?
   NO  -> Do NOT optimize this code. Find actual hot spot.
   YES -> PROCEED with code tuning (see below)

Step 2 Detail: Measurement

What counts as valid measurement:

Actual profiling data (timing, call counts, memory usage)
Multiple runs to account for variance
Specific hotspot identification, not just "it's slow"

Identify WHICH dimension: throughput, latency, memory, or CPU. Different problems need different solutions.

Step 4 Detail: Fundamental Fixes (APOSD Stage 2)

Before code-level changes, check for architectural fixes:

Add a cache? Eliminate repeated expensive computation
Better algorithm? e.g., balanced tree vs. list, hash map vs. linear search
Bypass layers? e.g., kernel bypass for networking, direct buffer access

If a fundamental fix exists, implement it with standard design techniques. If not, continue down the tree.

Step 4 Extended: Critical Path Redesign (APOSD Stage 3)

When no fundamental fix is available, redesign the critical path:

Ask: What is the smallest amount of code for the common case?
Disregard existing code structure entirely
Ignore special cases in current code -- consider only data needed for critical path
Define "the ideal" -- simplest and fastest code assuming complete redesign freedom
Design the rest of the class around these critical paths

Consolidation techniques:

Technique	Example
Encode multiple conditions in single value	Variable that is 0 when any special case applies
Single test for multiple cases	Replace 6 individual checks with 1 combined check
Combine layers into single method	Critical path handled in one method, not three
Merge variables	Combine multiple values into single structure

Code Tuning Procedure (STRICT ORDER)

Only reached after completing the 7-step decision tree.

1. Save working version (cannot revert without backup)
2. Make ONE change (multiple changes = unmeasurable)
3. Measure improvement (same workload, before/after)
4. Keep if faster, revert if not (no "close enough")
5. Repeat

Technique Priority by Category

Logic:

Stop testing when answer known (break, short-circuit)
Order tests by frequency (most common first)
Substitute table lookups for complex logic
Use lazy evaluation

Loops:

Unswitch (move invariant tests outside)
Jam/fuse loops operating on same range
Put busiest loop on inside
Minimize work inside loops
Use sentinel values for search loops
Unroll ONLY if measured (can be -27% in Python!)

Data:

Use integers instead of floating-point when possible
Use fewest array dimensions
Cache frequently computed values
Precompute results where practical

Expressions:

Initialize at compile time
Exploit algebraic identities
Use strength reduction (multiplication -> addition)
Eliminate common subexpressions

After Making Changes

Checklist and code examples: Read(${CLAUDE_SKILL_DIR}/checklists.md)

Re-measure before keeping any change. Keep only if: significant speedup (with data), OR simpler AND at least as fast. Otherwise back it out.

Red Flags

Red Flag	Symptom
Premature Optimization	Optimizing without measurement
Death by Thousand Cuts	Many small inefficiencies, no single fix helps (5-10x slower)
Pass-Through Methods	Identical signature to caller, unnecessary layer crossing
Shallow Layers	Multiple layers providing same abstraction
Repeated Special Cases	Same conditions checked multiple times
Trading maintainability for <10% gain	Complex optimization for minor speedup

Quick Reference

Threshold/Rule	Value	Source
Hot spot concentration	<4% causes >50% runtime	Knuth 1971
Failed optimization rate	>50% negligible or negative	CC p.607
Compiler optimization gains	40-59% improvement possible	CC p.596
I/O vs memory	~1000x difference	CC p.591

Checker

Checklist: Read(${CLAUDE_SKILL_DIR}/checklists.md)

Output Format:

Item	Status	Evidence	Location
Measured before tuning?	VIOLATION	No profiler/measurement found	N/A
Loop unswitching opportunity	WARNING	Invariant `if (debug)` inside loop	app.py:142

Severity: VIOLATION (clear anti-pattern), WARNING (needs measurement), PASS (no issues)

Chain

After	Next
Optimization complete	Verify design not degraded
Structure degraded	cc-refactoring-guidance