Background Jobs
SkySignal monitors background job queues in your Meteor application, giving you visibility into job execution, failures, queue depth, and worker utilization. The agent auto-detects your job package and begins tracking without any code changes.
Supported Packages
| Package | Identifier | Storage | Auto-Detected |
|---|---|---|---|
| msavin:sjobs (Steve Jobs) | msavin:sjobs | MongoDB (jobs_data collection) | Yes |
| BullMQ | bullmq | Redis | Yes |
The agent checks for each package at startup and activates the first one it finds. If your app uses both packages, the agent monitors both and tags each job with its originating package so you can filter and compare them in the dashboard.
msavin:sjobs (Steve Jobs)
Steve Jobs is a MongoDB-backed job queue built for Meteor. The agent monitors it by observing the jobs_data collection for state changes.
Requirements:
msavin:sjobspackage installed (meteor add msavin:sjobs)- No additional configuration needed -- the agent detects the global
Jobsobject automatically
Configuration:
{
"skysignal": {
"collectJobs": true,
"jobsInterval": 30000
}
}
| Option | Type | Default | Description |
|---|---|---|---|
collectJobs | Boolean | true | Enable or disable job monitoring |
jobsInterval | Integer | 30000 | How often to collect queue statistics (ms) |
Steve Jobs uses a single "default" queue. The agent tracks job lifecycle through MongoDB observer callbacks, so there is no polling overhead for individual job events -- only the periodic stats collection runs on the interval.
BullMQ
BullMQ is a Redis-backed job queue. The agent discovers queues by scanning Redis for BullMQ key patterns (bull:*:meta) and attaches QueueEvents listeners for real-time job tracking.
Requirements:
bullmqnpm package installed (npm install bullmq)ioredisnpm package installed (npm install ioredis) -- used for queue discovery- A running Redis instance
Configuration:
{
"skysignal": {
"collectJobs": true,
"jobsInterval": 30000,
"bullmqRedis": {
"host": "localhost", // Redis host
"port": 6379 // Redis port
}
}
}
| Option | Type | Default | Description |
|---|---|---|---|
collectJobs | Boolean | true | Enable or disable job monitoring |
jobsInterval | Integer | 30000 | How often to collect queue statistics (ms) |
bullmqRedis | Object | { host: "localhost", port: 6379 } | Redis connection for queue discovery |
bullmqQueues | Array | [] | Manually specify queues to monitor (see below) |
detailedTracking | Boolean | true | Fetch full job details on failure for stacktraces |
jobCacheMaxSize | Integer | 2000 | Max entries in the job detail cache |
jobCacheTTL | Integer | 120000 | Job cache entry TTL in ms |
Manual queue configuration:
If the agent cannot discover your queues via Redis scanning (e.g., non-standard Redis key prefix), you can list them explicitly:
{
"skysignal": {
"collectJobs": true,
"bullmqQueues": [
{
"name": "emailQueue",
"connection": { "host": "redis.example.com", "port": 6379 }
},
{
"name": "reportQueue"
}
]
}
}
Queues listed in bullmqQueues are monitored immediately on startup. The agent also performs periodic Redis scans to discover new queues that are created after the app starts.
BullMQ queues are discovered by scanning Redis for keys matching bull:*:meta. If your queues don't appear in the dashboard, make sure the agent's Redis connection matches the one your app uses, or list the queues manually via bullmqQueues.
Forcing a Specific Package
If your app has both packages installed but you only want to monitor one:
{
"skysignal": {
"jobsPackage": "bullmq"
}
}
| Option | Type | Default | Description |
|---|---|---|---|
jobsPackage | String | Auto-detect | Force a specific package: "msavin:sjobs" or "bullmq" |
What Gets Tracked
For every job that runs through your queue, the agent captures:
| Field | Description |
|---|---|
jobId | Unique identifier for the job |
jobName | Job name / type (e.g., "sendWelcomeEmail") |
jobType | Auto-inferred category (email, report, sync, etc.) |
queueName | Queue the job belongs to ("default" for Steve Jobs, queue name for BullMQ) |
jobsPackage | Originating package ("msavin:sjobs" or "bullmq") |
status | Current state: pending, running, completed, failed, stalled, cancelled |
queuedAt | When the job was added to the queue |
startedAt | When execution began |
completedAt | When execution finished |
duration | Execution time in milliseconds |
delay | Time spent waiting in queue before execution (ms) |
attempts | Number of execution attempts |
error | Error details if the job failed (message, stack trace) |
priority | Job priority level |
progress | Progress percentage for long-running jobs (0-100) |
host | Server hostname that processed the job |
originatingTraceId | Trace ID of the Meteor Method that enqueued the job |
Job Status Lifecycle
pending → running → completed
→ failed
→ stalled → failed (auto after 30 min)
pending → cancelled
Jobs that remain in running status for more than 30 minutes without a completion event are automatically marked as failed with a StaleJobTimeout error. This handles cases where the monitored app crashes mid-job.
Stalled Job Detection
Both monitors detect stalled jobs, but through different mechanisms:
- Steve Jobs: A job is considered stalled when its
duetime is more than 5 minutes in the past and its state is stillpending - BullMQ: Uses BullMQ's built-in
stalledevent, which fires when a worker fails to renew its lock within the configured stall interval
Trace Correlation
When a Meteor Method enqueues a background job, SkySignal links the job back to the originating method trace. This lets you follow the full request lifecycle from the initial Method call through to job completion.
How it works:
- Steve Jobs: The agent wraps
Jobs.run()and captures the current method'straceIdfromAsyncLocalStorage. When the job document appears in MongoDB, the agent matches it to the pending trace context. - BullMQ: The agent wraps
Queue.add()andQueue.addBulk()to inject a__skysignal_traceIdfield into the job data. When the job executes, the agent extracts the trace ID from the job payload.
Trace correlation is automatic -- no code changes required. In the dashboard, jobs with an originating trace show a link to the parent method trace.
Dashboard
Navigate to your site's Jobs tab to see the monitoring data.
Overview
The top of the page shows four summary cards:
- Total Jobs -- Count of jobs in the selected time range, with throughput (jobs/min)
- Success Rate -- Percentage of jobs that completed successfully
- Avg Execution -- Average job duration, plus the longest job in the period
- Failed Jobs -- Count of failed and stalled jobs that need attention
Filtering
Two filter dropdowns appear in the tab bar when applicable:
- Package filter -- Shown when your app uses multiple job packages. Filter by
msavin:sjobs,bullmq, or view all. - Queue filter -- Shown when multiple queues exist. Filter by specific queue or view all.
Both filters apply to all tabs: Running, Failed, Scheduled, Recent Jobs, Performance, and Analytics.
Tabs
Running -- Currently executing jobs with live duration, progress bars, and the ability to cancel.
Failed -- Jobs that failed or stalled, with error details. Stalled jobs are highlighted with a warning badge on the tab.
Scheduled -- Pending jobs waiting to execute, including scheduled future runs and repeat patterns.
Recent Jobs -- Full history of recent job executions. Click a row to expand and see job details, error stack traces, and a link to the originating method trace.
Performance -- Per-job-type performance breakdown with P50/P95/P99 duration percentiles, failure rates, throughput, and trend comparison against the previous period.
Analytics -- Queue depth history over time, failure rate trend charts, and worker utilization gauges per queue.
Best Practices
Monitor failure rates
A failure rate above 5% usually indicates a systemic issue. Use the Performance tab to identify which job types are failing most, then check the Failed tab for error details and stack traces.
Watch queue depth
A growing queue depth means jobs are being enqueued faster than they can be processed. Either increase worker concurrency or optimize job execution time. The Analytics tab shows queue depth over time so you can spot trends.
Use trace correlation
When a job fails, check if it has an originating trace. The parent Method call often provides context about why the job was created with bad data or under unusual conditions.
Set appropriate intervals
The default jobsInterval of 30 seconds works well for most apps. For high-throughput queues (thousands of jobs per minute), consider increasing it to 60 seconds to reduce overhead. Job lifecycle events (start, complete, fail) are always tracked in real-time regardless of the interval setting.
{
"skysignal": {
"collectJobs": true,
"jobsInterval": 60000
}
}
Troubleshooting
Jobs Not Appearing in Dashboard
- Verify
collectJobsistrue(or not set -- it defaults totrue) - Check that your job package is installed and running
- Enable debug mode to see agent logs:
"debug": true - For BullMQ, verify the Redis connection matches your app's Redis
BullMQ Queues Not Discovered
- Check that
ioredisis installed (npm ls ioredis) - Verify the
bullmqRedisconnection settings match your Redis instance - Try listing queues manually via
bullmqQueues - Check Redis for BullMQ keys:
redis-cli KEYS "bull:*:meta"
Steve Jobs Observer Not Starting
- Verify the
Jobsglobal is available (typeof Jobsinmeteor shell) - Check that
Jobs.collectionexists - The observer requires server-side Meteor reactivity -- make sure you are not running in a non-standard Meteor environment
High Memory Usage
If you have a very high-throughput BullMQ setup, the job detail cache can grow. Reduce jobCacheMaxSize or jobCacheTTL:
{
"skysignal": {
"jobCacheMaxSize": 500,
"jobCacheTTL": 60000
}
}
Stale Jobs Accumulating
Jobs stuck in running for over 30 minutes are auto-marked as failed. If you see many stale jobs, it usually means:
- Your app crashed or restarted while jobs were running
- Job execution exceeds 30 minutes (consider splitting long jobs into smaller steps)
- The agent lost connection to the monitored app
Next Steps
- Configuration - Full configuration reference
- Method Tracing - Understand trace correlation with jobs
- Error Tracking - Track errors across methods and jobs
- Performance Optimization - Optimize job throughput