Skip to main content

Anomaly Detection

SkySignal automatically detects unusual behavior in your application without requiring you to configure thresholds.

How It Works

SkySignal continuously analyzes historical data for your application to establish baselines for key metrics. When a metric deviates significantly from its baseline, SkySignal flags it as an anomaly.

The detection uses statistical analysis -- specifically, it tracks the rolling mean and standard deviation for each metric, then alerts when the current value moves too many standard deviations away from the mean. No machine learning black box, just straightforward statistics that are easy to reason about.

Monitored Metrics

Anomaly detection runs across six core metrics:

MetricWhat It Tracks
Response TimeAverage method response time
Error RatePercentage of requests resulting in errors
CPU UsageServer CPU utilization
Memory UsageServer memory utilization
Requests/HourIncoming request volume
Active SessionsNumber of connected users

These cover the key dimensions of application health -- performance, reliability, resources, and usage.

Zero Configuration

Anomaly detection requires no setup. Once your application starts sending data to SkySignal, baselines are computed automatically from your historical data. As your application's traffic patterns change over time, the baselines adapt.

The detection job runs as a background process at regular intervals, checking each metric against its baseline.

Anomaly Radar

The dashboard includes an Anomaly Radar visualization -- a constellation chart that plots current values against baselines across all six metric axes. This gives you a quick visual read on overall application health:

  • Values near the center are close to baseline (normal)
  • Values extending outward are deviating from baseline (potentially anomalous)
  • The shape of the radar tells you at a glance which dimensions are off

The radar is useful during incident triage. Instead of checking six different charts, you get a single view showing which metrics are abnormal.

Active Anomaly Alerts

When an anomaly is detected, it appears as a banner on your dashboard with:

  • Metric name - Which metric is anomalous
  • Severity - Based on how many standard deviations from the baseline
  • Description - Human-readable explanation of what's happening

For example:

cpu_usage is 3.3 standard deviations above normal.
Observed: 11.31, Expected: 7.10 +/- 1.29

This tells you exactly what the current value is, what normal looks like, and how far off you are. No guesswork about whether a value is "high" -- you can see the numbers.

Anomaly-Based Alerts

You can create alert rules that fire based on anomaly detection rather than fixed thresholds. This is useful when:

  • You don't know what the "right" threshold should be
  • Normal values vary by time of day or day of week
  • You want to catch unexpected changes without setting static limits

Anomaly alerts work well alongside traditional threshold alerts. Use fixed thresholds for hard limits you never want to exceed (like CPU > 95%), and anomaly alerts for catching drift you wouldn't have predicted.

See the Alerting Guide for details on creating anomaly-based alert rules.

Understanding Anomaly Severity

Severity is determined by the number of standard deviations from the baseline:

Standard DeviationsSeverityMeaning
2-3LowNotable deviation, worth a look
3-4MediumSignificant deviation, likely a real issue
4+HighExtreme deviation, something is clearly wrong

These thresholds work well for most applications. A 3-sigma event has roughly a 0.3% chance of occurring randomly, so anything above that is almost certainly a real change in behavior.

Practical Tips

Don't Panic on Every Anomaly

Low-severity anomalies happen. A brief CPU spike during a cron job or a traffic bump from a marketing email are expected. Focus on sustained anomalies and high-severity ones.

Use Anomalies for Postmortem Context

When investigating an incident, check the anomaly history to see which metrics went anomalous and when. This often reveals the root cause faster than manually checking each metric chart.

Let Baselines Stabilize

Anomaly detection works best after your application has been sending data for at least a few days. During the first hours, baselines are still being established and you may see false positives.

Next Steps