performance-benchmark

name: performance-benchmark description: >- Capture runtime performance metrics (Core Web Vitals, resource sizes, load times) against defined budgets. Compare to baselines, flag regressions, and maintain trend history. Complements the code-level performance-review agent with actual runtime measurement. role: worker user-invocable: false

Performance Benchmark Skill

Measures runtime performance (what the user experiences) — complements the static performance-review agent. Uses Playwright + Chromium to load pages, collect timing via the Performance API, measure resource sizes, and compare against baselines and budgets.

Prerequisites

npx playwright install chromium

A running dev server (or accessible URL) for the pages being benchmarked.

Metrics collected

Core Web Vitals

Metric	API	Budget Default	Measures
LCP (Largest Contentful Paint)	`PerformanceObserver('largest-contentful-paint')`	≤ 2500ms	When main content is visible
FID (First Input Delay)	`PerformanceObserver('first-input')`	≤ 100ms	First-interaction responsiveness
CLS (Cumulative Layout Shift)	`PerformanceObserver('layout-shift')`	≤ 0.1	Visual stability during load
INP (Interaction to Next Paint)	`PerformanceObserver('event')`	≤ 200ms	Responsiveness across lifecycle

Navigation Timing

Metric	API	Measures
TTFB	`performance.timing.responseStart - navigationStart`	Server response time
FCP (First Contentful Paint)	`PerformanceObserver('paint')`	When first content appears
DOM Interactive	`performance.timing.domInteractive`	When DOM is parseable
Load Complete	`performance.timing.loadEventEnd`	Page fully loaded

Resource metrics

Metric	Collection	Budget Default
Total transfer size	`performance.getEntriesByType('resource')` sum	≤ 500KB
JS bundle size	Resources with `.js` extension	≤ 200KB
CSS bundle size	Resources with `.css` extension	≤ 50KB
Image payload	Resources with image MIME types	≤ 300KB
Request count	Resource entry count	≤ 50
Largest resource	Max single transfer size	≤ 150KB

Collection procedure

Run a Playwright script that:

Launches headless Chromium with consistent viewport (1280×720) and CPU throttling (4x for mobile, 1x for desktop)
Navigates with waitUntil: 'networkidle'
Injects a Performance Observer for Web Vitals
Waits 2s after load for metrics to stabilize
Collects all performance.getEntriesByType('resource') entries
Returns structured JSON

Full script template: references/benchmark-script.md.

Reliability: run each page 3 times, take the median. Use --disable-gpu and --disable-extensions. Clear cookies and use a fresh context per run. No network throttling by default; --3g flag for mobile simulation.

Modes

Baseline (`--baseline`)

Capture current metrics and save as the baseline:

benchmarks/<slug>/baseline.json

Commit baselines so the team shares a reference point.

Compare (default)

Run and compare against the saved baseline:

Change	Status
Metric worsened > 10%	`fail` (regression)
Metric worsened 5–10%	`warn` (degradation)
Metric improved > 10%	noted in report
Within ±5%	`pass` (stable)

Budget (`--budget`)

Check absolute budgets from performance-budget.json at the project root:

{
  "budgets": [
    {
      "path": "/",
      "metrics": {
        "LCP": 2500,
        "CLS": 0.1,
        "totalTransferSize": 500000,
        "jsSize": 200000
      }
    },
    {
      "path": "/dashboard",
      "metrics": { "LCP": 3000, "totalTransferSize": 800000 }
    }
  ]
}

If no budget file, fall back to defaults above.

Trend (`--trend`)

Read benchmarks/<slug>/history.jsonl and produce a trend summary across the last N runs.

Output format (JSON)

{
  "url": "http://localhost:3000/dashboard",
  "timestamp": "2026-04-10T14:30:00Z",
  "runs": 3,
  "device": "desktop",
  "metrics": {
    "LCP": {"median": 1850, "p95": 2100, "unit": "ms"},
    "FCP": {"median": 920, "p95": 1050, "unit": "ms"},
    "CLS": {"median": 0.05, "p95": 0.08, "unit": "score"},
    "INP": {"median": 120, "p95": 180, "unit": "ms"},
    "TTFB": {"median": 210, "p95": 280, "unit": "ms"},
    "loadComplete": {"median": 2400, "p95": 2800, "unit": "ms"}
  },
  "resources": {
    "totalTransferSize": 342000,
    "jsSize": 156000,
    "cssSize": 28000,
    "imageSize": 98000,
    "requestCount": 34,
    "largestResource": {"url": "/assets/main.js", "size": 95000}
  },
  "comparison": {
    "baseline": "2026-04-08T10:00:00Z",
    "regressions": [
      {"metric": "LCP", "baseline": 1600, "current": 1850, "change": "+15.6%", "severity": "fail"}
    ],
    "improvements": [],
    "stable": ["FCP", "CLS", "INP", "TTFB"]
  },
  "budget": { "status": "pass", "violations": [] },
  "status": "pass|warn|fail"
}

Human-readable report template: examples/report-format.md.

File storage

benchmarks/
├── <page-slug>/
│   ├── baseline.json        # Committed — shared baseline
│   └── history.jsonl        # Committed — trend data (append-only)
└── performance-budget.json  # Committed — budget definitions

Page slug derived from the URL path: /dashboard → dashboard, / → index.