lading-optimize-review - SKILL.md Agent Skill

name: lading-optimize-review description: Reviews optimization patches using a 5-persona peer review system. Requires unanimous approval backed by benchmarks. argument-hint: "[bench] [fingerprint] [file] [target] [technique]" allowed-tools: Bash(cat:) Bash(sample:) Bash(samply:) Bash(cargo:) Bash(ci/:) Bash(hyperfine:) Bash(/payloadtool:) Bash(tee:) Read Glob Grep context: fork

Optimization Patch Review

A rigorous 5-persona peer review system for optimization patches in lading. Requires unanimous approval backed by concrete benchmark data. Duplicate Hunter persona prevents redundant work.

Role: Judge

Review is the decision-maker. It does NOT record results.

Review judges using benchmarks and 5-persona review, then returns a structured report.

Outcomes

Outcome	Votes	Action
APPROVED	5/5 APPROVE	Return APPROVED report
REJECTED	Any REJECT	Return REJECTED report

Arguments

This skill requires 5 positional arguments passed by the caller:

Arg	Field	Example	Used for
`$ARGUMENTS[0]`	bench	`trace_agent`	`cargo criterion --bench` flag
`$ARGUMENTS[1]`	fingerprint	`ci/fingerprints/trace_agent_v04/lading.yaml`	payloadtool config path
`$ARGUMENTS[2]`	file	`lading_payload/src/trace_agent/v04.rs`	report + duplicate check
`$ARGUMENTS[3]`	target	`V04::to_bytes`	report
`$ARGUMENTS[4]`	technique	`buffer-reuse`	report + duplicate check

If any argument is missing -> REJECT. All 5 are required.

Generate Report ID

Derive the id from the file and technique arguments:

Take the filename stem from $ARGUMENTS[2] (e.g. lading_payload/src/trace_agent/v04.rs → trace-agent-v04)
Append the technique $ARGUMENTS[4] (e.g. buffer-reuse)
Join with - → trace-agent-v04-buffer-reuse

Use this id in the report.

Phase 1: Benchmark Execution

Step 1: Read Baseline Data

Read the baseline benchmark files captured:

/tmp/criterion-baseline.log — micro-benchmark baseline
/tmp/baseline.json — macro-benchmark timing baseline
/tmp/baseline-mem.txt — macro-benchmark memory baseline

If baseline data is missing -> REJECT. Baselines must be captured before any code change and before this gets invoked.

Step 2: Run Post-Change Micro-benchmarks

cargo criterion --bench $ARGUMENTS[0] 2>&1 | tee /tmp/criterion-optimized.log

Note: Criterion automatically compares against the last run and reports percentage changes.

Compare results — look for "change:" lines showing improvement/regression.

Example output: time: [1.2345 ms 1.2456 ms 1.2567 ms] change: [-5.1234% -4.5678% -4.0123%]

Step 3: Run Post-Change Macro-benchmarks

cargo build --release --bin payloadtool
hyperfine --warmup 3 --runs 30 --export-json /tmp/optimized.json \
  "./target/release/payloadtool $ARGUMENTS[1]"
./target/release/payloadtool "$ARGUMENTS[1]" --memory-stats 2>&1 | tee /tmp/optimized-mem.txt

Statistical Requirements

Minimum 30 runs for hyperfine (--runs 30)
Criterion handles statistical significance internally
Time improvement >= 5% for significance
Memory improvement >= 10% for significance
Allocation reduction >= 20% for significance

NO EXCEPTIONS

"Test dependencies don't work" -> REJECT. Fix dependencies first.
"Theoretically better" -> REJECT. Prove it with numbers.
"Obviously an improvement" -> REJECT. Obvious is not measured.
"Will benchmark later" -> REJECT. Benchmark now.

Phase 2: Five-Persona Review

1. Duplicate Hunter (Checks for Redundant Work)

Check .claude/skills/lading-optimize-hunt/assets/db.yaml for $ARGUMENTS[2] + $ARGUMENTS[4] combo
File + technique combo not already approved
No substantially similar optimization exists
If duplicate found -> REJECT with "DUPLICATE: see "

2. Skeptic (Demands Proof)

Baseline data verified present
Post-change benchmarks executed with identical methodology
Hot path verified via profiling (not just guessed)
Micro threshold met (>=5% time)
Macro threshold met (>=5% time OR >=10% mem OR >=20% allocs)
Statistical significance confirmed (p<0.05 or criterion "faster")
Improvement is real, not measurement noise
Benchmark methodology sound (same config, same machine)

3. Conservative (Guards Correctness)

Run ci/validate and validate that it passes completely
No semantic changes to output
Determinism preserved (same seed -> same output)
No .unwrap() or .expect() added (lading MUST NOT panic)
No bugs introduced (if bug found -> REJECT with bug details)
Property tests exist for changed code

4. Rust Expert (Lading-Specific Patterns)

No mod.rs files (per CLAUDE.md)
All use statements at file top (not inside functions)
Format strings use named variables ("{index}" not "{}")
Pre-computation in initialization, not hot paths
Worst-case behavior considered, not just average-case
No unnecessary cloning or allocation in hot paths

5. Greybeard (Simplicity Judge)

Code still readable without extensive comments
Complexity justified by measured improvement
"Obviously fast" pattern, not clever trick
Follows 3-repeat abstraction rule (no premature abstraction)
Change is minimal - no scope creep

Phase 3: Kani/Property Test Check

If the optimization touches critical code:

For lading_throttle:

ci/kani lading_throttle

For lading_payload:

ci/kani lading_payload

Kani constraints:

Kani proofs are more complete but labor-intensive
Kani may not compile complex code
Kani runs are EXTREMELY slow for complex code

If Kani fails to run:

Document why (compilation error? timeout?)
Verify comprehensive property tests exist instead
This is acceptable - Kani feasibility varies

Phase 4: Decision

Outcome	Votes	Action
APPROVED	5/5 APPROVE	Return APPROVED report
REJECTED	Any REJECT	Return REJECTED report

Duplicates, bugs, correctness issues, and missing benchmarks are all rejections. Describe the specific reason in the report's reason field.

Phase 5: Return Report

Review does NOT record results and does NOT create files. Return a structured YAML report to the caller.

Fill in the appropriate template and return the completed YAML:

Verdict	Template
approved	`.claude/skills/lading-optimize-review/assets/approved.template.yaml`
rejected	`.claude/skills/lading-optimize-review/assets/rejected.template.yaml`

Read the appropriate template from the .claude/skills/lading-optimize-review/assets/ directory
Fill in placeholders using argument values:
- id → generated ID (see "Generate Report ID" above)
- target → $ARGUMENTS[2]:$ARGUMENTS[3] (e.g. lading_payload/src/trace_agent/v04.rs:V04::to_bytes)
- technique → $ARGUMENTS[4]
Fill remaining fields with actual benchmark data from the review
Return the filled-in report