Skip to main content

Alerts

Get notified when something goes wrong with your application.

Alert Types

Performance Alerts

  • Method response time exceeds threshold
  • Error rate exceeds threshold
  • Throughput drops below threshold

System Alerts

  • CPU usage exceeds threshold
  • Memory usage exceeds threshold
  • Event loop lag exceeds threshold

Error Alerts

  • New error type detected
  • Error count exceeds threshold
  • Specific error pattern matches

Anomaly Detection

SkySignal includes automated anomaly detection that establishes rolling baselines for key metrics: response time, error rate, CPU usage, memory usage, requests per hour, and active sessions.

Instead of manually setting fixed thresholds, anomaly detection learns what "normal" looks like for your app and flags deviations automatically. When a metric moves significantly away from its baseline, SkySignal generates an anomaly event with context like:

cpu_usage is 3.3 standard deviations above normal

Active anomalies show up on your dashboard with severity levels based on how far the metric has drifted from baseline. The Anomaly Radar visualization (a constellation chart on the main dashboard) plots current values against baselines across all 6 metric axes, giving you a quick read on overall application health.

You can configure anomaly-based alerts that fire when specific metrics exceed their baseline thresholds, which is useful when you don't know what the "right" static threshold should be. This works well alongside traditional threshold alerts -- use fixed thresholds for hard limits (e.g., CPU > 95%) and anomaly alerts for catching unexpected drift.

Creating Alerts

  1. Navigate to Site Settings > Alerts
  2. Click "Create Alert"
  3. Configure the alert:
Name: High Response Time
Condition: avg(method.responseTime) > 1000ms
For: 5 minutes
Severity: warning

Alert Configuration

Conditions

Define when the alert triggers:

MetricOperatorsExample
method.responseTimegreater, less, equalsgreater than 1000ms
method.errorRategreater, less, equalsgreater than 5%
system.cpuUsagegreater, less, equalsgreater than 80%
system.memoryUsagegreater, less, equalsgreater than 90%
error.countgreater, less, equalsgreater than 100

Duration

How long the condition must be true:

  • Instant - Alert immediately
  • 1 minute - Sustained for 1 minute
  • 5 minutes - Sustained for 5 minutes
  • 15 minutes - Sustained for 15 minutes

Severity

  • Info - Informational, no action needed
  • Warning - Should investigate
  • Critical - Requires immediate attention

Notification Channels

Email

Default notification method. Alerts sent to:

  • Account owner
  • Team members (configurable)

Email notifications include:

  • Alert name and severity
  • Current value vs threshold
  • Direct link to affected resource
  • Quick investigation steps

Webhook (Coming Soon)

Send alert payloads to any HTTP endpoint. Useful for integrating with incident management tools or custom workflows.

Slack (Coming Soon)

Post alerts directly to a Slack channel with formatted messages and action links.

Alert Examples

Slow Method Alert

Name: Slow API Methods
Condition: p95(method.responseTime) > 2000ms
For: 5 minutes
Severity: warning
Filter: methodName matches "api.*"

Error Spike Alert

Name: Error Rate Spike
Condition: errorRate > 5%
For: 1 minute
Severity: critical

Memory Alert

Name: High Memory Usage
Condition: system.memoryUsage > 85%
For: 10 minutes
Severity: warning

New Error Alert

Name: New Error Type
Condition: newErrorType = true
Severity: warning

Managing Alerts

Alert States

  • OK - Condition not met
  • Pending - Condition met, waiting for duration
  • Firing - Alert active, notifications sent
  • Resolved - Was firing, now OK

Silencing Alerts

Temporarily disable alerts during maintenance:

  1. Go to Alerts
  2. Click "Silence"
  3. Set duration (1h, 4h, 24h, custom)
  4. Add reason (optional)

Alert History

View past alerts:

  • When it fired
  • Duration
  • Resolution
  • Related metrics

Best Practices

1. Start Conservative

Begin with loose thresholds, tighten over time:

# Start here
Condition: responseTime > 5000ms

# Tighten after baseline
Condition: responseTime > 2000ms

2. Use Duration

Avoid alert fatigue with duration requirements:

# Bad: Alerts on every spike
For: instant

# Better: Sustained issues only
For: 5 minutes

3. Set Appropriate Severity

  • Info: Awareness, no action
  • Warning: Investigate when convenient
  • Critical: Wake someone up

Consolidate similar alerts to reduce noise:

  • Group by service or feature area
  • Use composite conditions when possible

Next Steps