Skip to main content

Incidents View

Track and manage production incidents across your Meteor application.

Overview

The Incidents view gives you a central place to manage production incidents -- whether they come from alert triggers or manual reports from your team. You can find it in the sidebar under Incidents.

Each incident tracks:

  • What went wrong
  • Current investigation status
  • Who's involved
  • Timeline of updates and status changes

Severity Levels

Incidents use four severity levels:

SeverityDescriptionTypical Response
SEV1CriticalAll hands on deck, customer-facing impact
SEV2MajorSignificant degradation, needs immediate attention
SEV3MinorLimited impact, can be addressed during business hours
SEV4LowMinimal impact, fix when convenient

Choose the severity that matches the actual user impact. You can always change it later as you learn more about the issue.

Incident Lifecycle

Every incident moves through a defined lifecycle:

Investigating → Identified → Monitoring → Resolved → Postmortem

Status Descriptions

  • Investigating - You know something is wrong but haven't pinpointed the cause yet
  • Identified - Root cause found, working on a fix
  • Monitoring - Fix deployed, watching to confirm it holds
  • Resolved - Issue confirmed fixed, normal operation restored
  • Postmortem - Post-resolution analysis phase

Status transitions are tracked in the incident timeline, so you always have a record of when each phase started.

Creating Incidents

From Alerts

When an alert fires, SkySignal can automatically create an incident. The alert's severity, metric data, and context are carried over to the incident so you don't have to re-enter anything.

Manually

  1. Navigate to Incidents in the sidebar
  2. Click Create Incident
  3. Fill in the details:
    • Title - Short description of the issue
    • Severity - SEV1 through SEV4
    • Description - What you know so far

Manual creation is useful for issues reported by users or caught during manual testing that haven't triggered any alerts.

Incident Timeline

Each incident has a timeline that records:

  • Status changes (with who made the change and when)
  • Updates and notes added by team members
  • Related alert triggers
  • Resolution details

The timeline is the source of truth for what happened during an incident. It's especially valuable during postmortem analysis when you need to reconstruct the sequence of events.

Postmortem Analysis

Once an incident is resolved, move it to Postmortem status to document what happened:

  • Root cause - What actually went wrong
  • Impact - How many users were affected, for how long
  • Timeline review - Key events and decisions during the incident
  • Action items - What to do to prevent recurrence

Postmortems are optional but recommended for SEV1 and SEV2 incidents. They help the team learn from incidents and improve reliability over time.

Next Steps

  • Alerts - Configure alerts that create incidents automatically
  • Alerting Guide - Set up alert rules and notification channels
  • Errors View - Investigate errors related to incidents