Skip to main content

Background Jobs

SkySignal monitors background job queues in your Meteor application, giving you visibility into job execution, failures, queue depth, and worker utilization. The agent auto-detects your job package and begins tracking without any code changes.

Supported Packages

PackageIdentifierStorageAuto-Detected
msavin:sjobs (Steve Jobs)msavin:sjobsMongoDB (jobs_data collection)Yes
BullMQbullmqRedisYes

The agent checks for each package at startup and activates the first one it finds. If your app uses both packages, the agent monitors both and tags each job with its originating package so you can filter and compare them in the dashboard.

msavin:sjobs (Steve Jobs)

Steve Jobs is a MongoDB-backed job queue built for Meteor. The agent monitors it by observing the jobs_data collection for state changes.

Requirements:

  • msavin:sjobs package installed (meteor add msavin:sjobs)
  • No additional configuration needed -- the agent detects the global Jobs object automatically

Configuration:

{
"skysignal": {
"collectJobs": true,
"jobsInterval": 30000
}
}
OptionTypeDefaultDescription
collectJobsBooleantrueEnable or disable job monitoring
jobsIntervalInteger30000How often to collect queue statistics (ms)

Steve Jobs uses a single "default" queue. The agent tracks job lifecycle through MongoDB observer callbacks, so there is no polling overhead for individual job events -- only the periodic stats collection runs on the interval.

BullMQ

BullMQ is a Redis-backed job queue. The agent discovers queues by scanning Redis for BullMQ key patterns (bull:*:meta) and attaches QueueEvents listeners for real-time job tracking.

Requirements:

  • bullmq npm package installed (npm install bullmq)
  • ioredis npm package installed (npm install ioredis) -- used for queue discovery
  • A running Redis instance

Configuration:

{
"skysignal": {
"collectJobs": true,
"jobsInterval": 30000,
"bullmqRedis": {
"host": "localhost", // Redis host
"port": 6379 // Redis port
}
}
}
OptionTypeDefaultDescription
collectJobsBooleantrueEnable or disable job monitoring
jobsIntervalInteger30000How often to collect queue statistics (ms)
bullmqRedisObject{ host: "localhost", port: 6379 }Redis connection for queue discovery
bullmqQueuesArray[]Manually specify queues to monitor (see below)
detailedTrackingBooleantrueFetch full job details on failure for stacktraces
jobCacheMaxSizeInteger2000Max entries in the job detail cache
jobCacheTTLInteger120000Job cache entry TTL in ms

Manual queue configuration:

If the agent cannot discover your queues via Redis scanning (e.g., non-standard Redis key prefix), you can list them explicitly:

{
"skysignal": {
"collectJobs": true,
"bullmqQueues": [
{
"name": "emailQueue",
"connection": { "host": "redis.example.com", "port": 6379 }
},
{
"name": "reportQueue"
}
]
}
}

Queues listed in bullmqQueues are monitored immediately on startup. The agent also performs periodic Redis scans to discover new queues that are created after the app starts.

Queue discovery

BullMQ queues are discovered by scanning Redis for keys matching bull:*:meta. If your queues don't appear in the dashboard, make sure the agent's Redis connection matches the one your app uses, or list the queues manually via bullmqQueues.

Forcing a Specific Package

If your app has both packages installed but you only want to monitor one:

{
"skysignal": {
"jobsPackage": "bullmq"
}
}
OptionTypeDefaultDescription
jobsPackageStringAuto-detectForce a specific package: "msavin:sjobs" or "bullmq"

What Gets Tracked

For every job that runs through your queue, the agent captures:

FieldDescription
jobIdUnique identifier for the job
jobNameJob name / type (e.g., "sendWelcomeEmail")
jobTypeAuto-inferred category (email, report, sync, etc.)
queueNameQueue the job belongs to ("default" for Steve Jobs, queue name for BullMQ)
jobsPackageOriginating package ("msavin:sjobs" or "bullmq")
statusCurrent state: pending, running, completed, failed, stalled, cancelled
queuedAtWhen the job was added to the queue
startedAtWhen execution began
completedAtWhen execution finished
durationExecution time in milliseconds
delayTime spent waiting in queue before execution (ms)
attemptsNumber of execution attempts
errorError details if the job failed (message, stack trace)
priorityJob priority level
progressProgress percentage for long-running jobs (0-100)
hostServer hostname that processed the job
originatingTraceIdTrace ID of the Meteor Method that enqueued the job

Job Status Lifecycle

pending → running → completed
→ failed
→ stalled → failed (auto after 30 min)
pending → cancelled

Jobs that remain in running status for more than 30 minutes without a completion event are automatically marked as failed with a StaleJobTimeout error. This handles cases where the monitored app crashes mid-job.

Stalled Job Detection

Both monitors detect stalled jobs, but through different mechanisms:

  • Steve Jobs: A job is considered stalled when its due time is more than 5 minutes in the past and its state is still pending
  • BullMQ: Uses BullMQ's built-in stalled event, which fires when a worker fails to renew its lock within the configured stall interval

Trace Correlation

When a Meteor Method enqueues a background job, SkySignal links the job back to the originating method trace. This lets you follow the full request lifecycle from the initial Method call through to job completion.

How it works:

  • Steve Jobs: The agent wraps Jobs.run() and captures the current method's traceId from AsyncLocalStorage. When the job document appears in MongoDB, the agent matches it to the pending trace context.
  • BullMQ: The agent wraps Queue.add() and Queue.addBulk() to inject a __skysignal_traceId field into the job data. When the job executes, the agent extracts the trace ID from the job payload.

Trace correlation is automatic -- no code changes required. In the dashboard, jobs with an originating trace show a link to the parent method trace.

Dashboard

Navigate to your site's Jobs tab to see the monitoring data.

Overview

The top of the page shows four summary cards:

  • Total Jobs -- Count of jobs in the selected time range, with throughput (jobs/min)
  • Success Rate -- Percentage of jobs that completed successfully
  • Avg Execution -- Average job duration, plus the longest job in the period
  • Failed Jobs -- Count of failed and stalled jobs that need attention

Filtering

Two filter dropdowns appear in the tab bar when applicable:

  • Package filter -- Shown when your app uses multiple job packages. Filter by msavin:sjobs, bullmq, or view all.
  • Queue filter -- Shown when multiple queues exist. Filter by specific queue or view all.

Both filters apply to all tabs: Running, Failed, Scheduled, Recent Jobs, Performance, and Analytics.

Tabs

Running -- Currently executing jobs with live duration, progress bars, and the ability to cancel.

Failed -- Jobs that failed or stalled, with error details. Stalled jobs are highlighted with a warning badge on the tab.

Scheduled -- Pending jobs waiting to execute, including scheduled future runs and repeat patterns.

Recent Jobs -- Full history of recent job executions. Click a row to expand and see job details, error stack traces, and a link to the originating method trace.

Performance -- Per-job-type performance breakdown with P50/P95/P99 duration percentiles, failure rates, throughput, and trend comparison against the previous period.

Analytics -- Queue depth history over time, failure rate trend charts, and worker utilization gauges per queue.

Best Practices

Monitor failure rates

A failure rate above 5% usually indicates a systemic issue. Use the Performance tab to identify which job types are failing most, then check the Failed tab for error details and stack traces.

Watch queue depth

A growing queue depth means jobs are being enqueued faster than they can be processed. Either increase worker concurrency or optimize job execution time. The Analytics tab shows queue depth over time so you can spot trends.

Use trace correlation

When a job fails, check if it has an originating trace. The parent Method call often provides context about why the job was created with bad data or under unusual conditions.

Set appropriate intervals

The default jobsInterval of 30 seconds works well for most apps. For high-throughput queues (thousands of jobs per minute), consider increasing it to 60 seconds to reduce overhead. Job lifecycle events (start, complete, fail) are always tracked in real-time regardless of the interval setting.

{
"skysignal": {
"collectJobs": true,
"jobsInterval": 60000
}
}

Troubleshooting

Jobs Not Appearing in Dashboard

  1. Verify collectJobs is true (or not set -- it defaults to true)
  2. Check that your job package is installed and running
  3. Enable debug mode to see agent logs: "debug": true
  4. For BullMQ, verify the Redis connection matches your app's Redis

BullMQ Queues Not Discovered

  1. Check that ioredis is installed (npm ls ioredis)
  2. Verify the bullmqRedis connection settings match your Redis instance
  3. Try listing queues manually via bullmqQueues
  4. Check Redis for BullMQ keys: redis-cli KEYS "bull:*:meta"

Steve Jobs Observer Not Starting

  1. Verify the Jobs global is available (typeof Jobs in meteor shell)
  2. Check that Jobs.collection exists
  3. The observer requires server-side Meteor reactivity -- make sure you are not running in a non-standard Meteor environment

High Memory Usage

If you have a very high-throughput BullMQ setup, the job detail cache can grow. Reduce jobCacheMaxSize or jobCacheTTL:

{
"skysignal": {
"jobCacheMaxSize": 500,
"jobCacheTTL": 60000
}
}

Stale Jobs Accumulating

Jobs stuck in running for over 30 minutes are auto-marked as failed. If you see many stale jobs, it usually means:

  1. Your app crashed or restarted while jobs were running
  2. Job execution exceeds 30 minutes (consider splitting long jobs into smaller steps)
  3. The agent lost connection to the monitored app

Next Steps