name: incident-response description: How Apollo.io handles incidents — detection, response, escalation, and postmortem
Apollo Incident Response
Severity Levels
| Severity | Definition | Response Time |
|---|---|---|
| SEV-1 | Customer-facing outage or data loss | Immediate |
| SEV-2 | Significant degradation, limited blast radius | 15 min |
| SEV-3 | Minor issue, workaround available | 1 hour |
| SEV-4 | Low impact, no customer effect | Next business day |
How to Declare an Incident
- Post in #incidents Slack channel
- Page on-call via PagerDuty (or however you do it)
- Join the incident bridge: [link or instructions]
- Assign an Incident Commander (IC)
On-Call Rotation
- Rotations managed via [PagerDuty / OpsGenie / other]
- Each team has its own on-call schedule
- Escalation path: on-call engineer → team lead → engineering manager
During an Incident
- IC coordinates communication, not necessarily the fix
- Update #incidents every 15 min for SEV-1/2
- Post customer-facing status updates to [status page]
- Keep a running timeline in the incident doc
Postmortem Process
- Postmortem doc created within 48 hours of resolution
- Blameless — focus on systems and processes, not people
- Reviewed in weekly incident review meeting
- Action items tracked to completion
Contacts
- #incidents — active incident coordination
- #postmortems — postmortem docs and reviews
- On-call escalation: [PagerDuty link or escalation path]