name: oncall-handoff description: Generate a comprehensive on-call handoff document by aggregating open incidents, ongoing issues, recent deployments, and systems to watch. Orchestrates PagerDuty, Jira, and ArgoCD agents. Use during on-call rotation changes or shift handoffs.
On-Call Handoff
Build a structured handoff document for the incoming on-call engineer by collecting data from PagerDuty, Jira, and ArgoCD.
Instructions
Phase 1: Current Incident State (PagerDuty Agent)
- Active incidents - all triggered and acknowledged incidents
- Recently resolved incidents - resolved in the last 24 hours (may recur)
- Current on-call schedule - who is on-call now, who is taking over
- Upcoming maintenance windows - scheduled within the next 48 hours
Phase 2: Ongoing Issues (Jira Agent)
- Open incident tickets - Jira issues labeled
incident,outage,p0,p1 - Known issues - issues labeled
known-issueorworkaround - Recently closed issues - resolved in last 48 hours that may have follow-ups
- Pending changes - tickets in "Ready for Deploy" or "In Review" status
Phase 3: Environment State (ArgoCD Agent)
- Recent deployments - applications synced in the last 48 hours
- Unhealthy applications - any OutOfSync, Degraded, or Unknown apps
- Pending syncs - applications with pending changes not yet deployed
- Recent rollbacks - any applications that were rolled back
Phase 4: Compile Handoff Document
Organize all data into a structured, scannable document with clear action items.
Output Format
## On-Call Handoff Document
**Date**: February 9, 2026
**Outgoing**: @engineer-a (Feb 3 - Feb 9)
**Incoming**: @engineer-b (Feb 9 - Feb 16)
---
### Active Incidents (Action Required)
| Incident | Service | Urgency | Duration | Status |
|----------|---------|---------|----------|--------|
| INC-789 | auth-service | High | 2h 15m | Acknowledged |
**INC-789 Context**: Auth service intermittent 503 errors. Root cause suspected to be database connection pool exhaustion. DBA team engaged. Workaround: restart auth-service pods if error rate exceeds 10%.
### Recently Resolved (Watch For Recurrence)
| Incident | Service | Resolved | Duration | Root Cause |
|----------|---------|----------|----------|------------|
| INC-785 | payment-api | Feb 8 18:00 | 45m | Memory leak in v2.3.1 |
### Known Issues & Workarounds
1. **AUTH-456**: Auth service connection pool - restart pods if needed (ETA fix: Feb 12)
2. **PLAT-789**: Flaky integration tests - ignore `test_streaming` failures (known issue)
### Recent Deployments (Last 48h)
| Application | Version | Deployed | Status |
|-------------|---------|----------|--------|
| payment-api | v2.3.2 (hotfix) | Feb 8 19:00 | Healthy |
| auth-service | v1.8.0 | Feb 7 14:00 | Degraded |
### Unhealthy Applications
| Application | Sync Status | Health | Since |
|-------------|-------------|--------|-------|
| auth-service | Synced | Degraded | Feb 8 16:00 |
### Pending Changes (Not Yet Deployed)
- **monitoring-stack**: Prometheus alerting rule updates (PR #234 approved)
- **api-gateway**: Rate limiting config change (scheduled for Feb 10)
### Systems to Watch
1. **auth-service** - Connection pool issue ongoing, monitor error rates
2. **payment-api** - Hotfix deployed yesterday, watch for regression
3. **EKS cluster-prod** - Node scaling event expected during peak hours (10am-2pm)
### Escalation Contacts
| Team | Primary | Secondary |
|------|---------|-----------|
| Platform | @engineer-c | @engineer-d |
| DBA | @dba-primary | @dba-secondary |
| Security | @sec-oncall | - |
Examples
- "Generate an on-call handoff document"
- "Prepare a shift handoff for the incoming on-call engineer"
- "What should the next on-call person know about?"
- "Summarize the current state of production for handoff"
Guidelines
- Prioritize actionable information - what does the incoming engineer need to DO?
- Include workarounds for known issues so the incoming engineer does not have to search
- Mark items as "action required" vs "monitor" vs "FYI" for clear prioritization
- Always include escalation contacts with team context
- Keep the last 48 hours as the default lookback window for context
- If no active incidents exist, say so explicitly (this is good news)
- Include any scheduled maintenance that falls within the on-call period