Incidents View
Track and manage production incidents across your Meteor application.
Overview
The Incidents view gives you a central place to manage production incidents -- whether they come from alert triggers or manual reports from your team. You can find it in the sidebar under Incidents.
Each incident tracks:
- What went wrong
- Current investigation status
- Who's involved
- Timeline of updates and status changes
Severity Levels
Incidents use four severity levels:
| Severity | Description | Typical Response |
|---|---|---|
| SEV1 | Critical | All hands on deck, customer-facing impact |
| SEV2 | Major | Significant degradation, needs immediate attention |
| SEV3 | Minor | Limited impact, can be addressed during business hours |
| SEV4 | Low | Minimal impact, fix when convenient |
Choose the severity that matches the actual user impact. You can always change it later as you learn more about the issue.
Incident Lifecycle
Every incident moves through a defined lifecycle:
Investigating → Identified → Monitoring → Resolved → Postmortem
Status Descriptions
- Investigating - You know something is wrong but haven't pinpointed the cause yet
- Identified - Root cause found, working on a fix
- Monitoring - Fix deployed, watching to confirm it holds
- Resolved - Issue confirmed fixed, normal operation restored
- Postmortem - Post-resolution analysis phase
Status transitions are tracked in the incident timeline, so you always have a record of when each phase started.
Creating Incidents
From Alerts
When an alert fires, SkySignal can automatically create an incident. The alert's severity, metric data, and context are carried over to the incident so you don't have to re-enter anything.
Manually
- Navigate to Incidents in the sidebar
- Click Create Incident
- Fill in the details:
- Title - Short description of the issue
- Severity - SEV1 through SEV4
- Description - What you know so far
Manual creation is useful for issues reported by users or caught during manual testing that haven't triggered any alerts.
Incident Timeline
Each incident has a timeline that records:
- Status changes (with who made the change and when)
- Updates and notes added by team members
- Related alert triggers
- Resolution details
The timeline is the source of truth for what happened during an incident. It's especially valuable during postmortem analysis when you need to reconstruct the sequence of events.
Postmortem Analysis
Once an incident is resolved, move it to Postmortem status to document what happened:
- Root cause - What actually went wrong
- Impact - How many users were affected, for how long
- Timeline review - Key events and decisions during the incident
- Action items - What to do to prevent recurrence
Postmortems are optional but recommended for SEV1 and SEV2 incidents. They help the team learn from incidents and improve reliability over time.
Next Steps
- Alerts - Configure alerts that create incidents automatically
- Alerting Guide - Set up alert rules and notification channels
- Errors View - Investigate errors related to incidents