name: canary description: Post-deploy monitoring: watch production after a deploy and alert on regressions argument-hint: "[--baseline]" effort: medium disable-model-invocation: true
Canary: Post-Deploy Monitoring
Watch a live application after deployment. Alert on errors and regressions. Compare against a pre-deploy baseline.
Two modes:
--baseline: capture the current state BEFORE deploying- (default): monitor AFTER deploying and compare against baseline
Instructions
Phase 1: Setup
Parse the user's arguments and detect the deployment context.
# Detect current branch and recent deploy commit
git branch --show-current
git log --oneline -5
# Auto-detect platform from config files
[ -f fly.toml ] && echo "PLATFORM: fly"
[ -f render.yaml ] && echo "PLATFORM: render"
[ -f vercel.json ] && echo "PLATFORM: vercel"
[ -f netlify.toml ] && echo "PLATFORM: netlify"
[ -f Procfile ] && echo "PLATFORM: heroku"
[ -f railway.toml ] && echo "PLATFORM: railway"
# Check for health endpoint
curl -sf "${URL}/health" -w "\n%{http_code}" 2>/dev/null | tail -1
curl -sf "${URL}/api/health" -w "\n%{http_code}" 2>/dev/null | tail -1
Create the working directory:
mkdir -p .canary/baselines .canary/reports .canary/screenshots
Phase 2: Baseline Capture (--baseline mode)
Run this BEFORE deploying to capture the current healthy state.
For each page to monitor, record:
- HTTP status: is the page returning 200?
- Response time: how long does it take to load?
- Content snapshot: key text content to detect blank pages later
# For each page URL
for PAGE_PATH in "/" "/dashboard" "/settings" "/api/health"; do
SLUG=$(echo "$PAGE_PATH" | tr '/' '_' | tr -d '?&=')
RESULT=$(curl -sf -o /dev/null -w "%{http_code}|%{time_total}" "${BASE_URL}${PAGE_PATH}" 2>/dev/null)
STATUS=$(echo "$RESULT" | cut -d'|' -f1)
TIME_MS=$(echo "$RESULT" | awk -F'|' '{printf "%.0f", $2 * 1000}')
echo " ${PAGE_PATH}: HTTP ${STATUS}, ${TIME_MS}ms"
done
Save baseline to .canary/baselines/baseline.json:
{
"url": "<base-url>",
"timestamp": "<ISO-8601>",
"branch": "<branch-name>",
"commit": "<git-SHA>",
"pages": {
"/": { "status": 200, "time_ms": 450 },
"/dashboard": { "status": 200, "time_ms": 680 },
"/api/health": { "status": 200, "time_ms": 45 }
}
}
Then STOP and tell the user: "Baseline captured. Deploy your changes, then run /canary <url> to monitor."
Phase 3: Page Discovery
If no pages were specified, auto-discover pages to monitor.
From the application:
# Check sitemap if available
curl -sf "${URL}/sitemap.xml" 2>/dev/null | grep -oP '(?<=<loc>)[^<]+' | head -10
# Check robots.txt for known paths
curl -sf "${URL}/robots.txt" 2>/dev/null | grep -i "allow\|disallow" | head -10
# Common paths to always check
echo "Always check: / /login /dashboard /settings /api/health"
Default pages to monitor if nothing found: /, and the homepage only.
Phase 4: Monitoring Loop
Monitor for the specified duration (default: 10 minutes). Run a check every 60 seconds.
Each check cycle:
TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ)
CHECK_NUM=$((CHECK_NUM + 1))
for PAGE_PATH in "${PAGES[@]}"; do
# Check HTTP status and response time
RESULT=$(curl -sf -o /dev/null -w "%{http_code}|%{time_total}" \
--max-time 10 "${BASE_URL}${PAGE_PATH}" 2>/dev/null || echo "0|0")
STATUS=$(echo "$RESULT" | cut -d'|' -f1)
TIME_MS=$(echo "$RESULT" | awk -F'|' '{printf "%.0f", $2 * 1000}')
# Compare against baseline
BASELINE_STATUS=$(jq -r ".pages[\"${PAGE_PATH}\"].status // 200" .canary/baselines/baseline.json 2>/dev/null)
BASELINE_TIME=$(jq -r ".pages[\"${PAGE_PATH}\"].time_ms // 1000" .canary/baselines/baseline.json 2>/dev/null)
echo " [Check #${CHECK_NUM}] ${PAGE_PATH}: HTTP ${STATUS} (${TIME_MS}ms)"
done
Alert levels:
| Level | Condition | Trigger |
|---|---|---|
| CRITICAL | Page load failure | HTTP status is not 2xx, curl timeout, DNS failure |
| HIGH | New errors | Error rate increased vs baseline (console errors, 5xx responses) |
| MEDIUM | Performance regression | Response time exceeds 2x baseline |
| LOW | New broken links | Previously-working routes now return 404 |
Key principles:
- Alert on changes, not absolutes. A page with 3 errors in baseline is fine if still 3. One NEW error is an alert.
- Transient tolerance. Only alert on patterns persisting across 2+ consecutive checks. A single network blip is not an alert.
When a CRITICAL or HIGH alert fires (2 consecutive checks):
CANARY ALERT
=====================================
Time: [check #N at Xs elapsed]
Page: [URL]
Level: [CRITICAL / HIGH / MEDIUM / LOW]
Finding: [what changed, be specific]
Baseline: [baseline value]
Current: [current value]
=====================================
Options:
A) Investigate now: stop monitoring, focus on this issue
B) Continue monitoring: wait for next check to confirm
C) Rollback: revert the deploy
D) Dismiss: known issue, continue monitoring
Phase 5: Health Report
After monitoring completes (or user stops), produce a summary.
CANARY REPORT: [url]
=========================================
Duration: [X minutes]
Checks: [N total per page]
Pages: [N pages monitored]
Commit: [deployed SHA]
Status: [HEALTHY / DEGRADED / BROKEN]
Per-Page Results:
-----------------------------------------
Page Status Avg Time Alerts
/ HEALTHY 450ms 0
/dashboard DEGRADED 1100ms 1 medium (was 450ms)
/settings HEALTHY 380ms 0
/api/health HEALTHY 45ms 0
Alerts Fired: [N] (X critical, Y high, Z medium, W low)
VERDICT: [DEPLOY HEALTHY / DEPLOY HAS ISSUES (see alerts above)]
=========================================
Save report to .canary/reports/<date>-canary.md.
Phase 6: Baseline Update
If the deploy is healthy and the user wants to update the baseline:
cp .canary/reports/latest-snapshot.json .canary/baselines/baseline.json
echo "Baseline updated to commit $(git rev-parse --short HEAD)"
Output Format
See Phase 5 above for the full CANARY REPORT template.
Inline alert format (during monitoring):
[08:42:15] Check #3: /dashboard: ALERT HIGH, response time 1250ms (baseline: 420ms)
[08:43:15] Check #4: /dashboard: ALERT HIGH, response time 1180ms (baseline: 420ms)
-> Consistent across 2 checks. Firing alert.
Usage
/canary https://app.example.com # Monitor homepage for 10 min
/canary https://app.example.com --baseline # Capture baseline before deploying
/canary https://app.example.com --duration 5m # Monitor for 5 minutes
/canary https://app.example.com --quick # Single-pass health check (no loop)
/canary https://app.example.com --pages /,/dashboard,/api/health
Tips
- Always capture a baseline before deploying to production: run
/canary <url> --baseline - Start monitoring immediately after deploy; the first 5 minutes catch 90% of regressions
- CRITICAL alerts require immediate investigation; don't wait for the monitoring to finish
- MEDIUM alerts (performance) may be cache warming; give it 2-3 more checks before acting
- Keep
.canary/baselines/in git so any team member can run canary against the same baseline
Related Commands
/ship: pre-deploy checklist (run before deploying)/land-and-deploy: full merge-to-verify pipeline (runs canary automatically)/qa: interactive QA testing before shipping
$ARGUMENTS