Background Jobs

SkySignal monitors background job queues in your Meteor application, giving you visibility into job execution, failures, queue depth, and worker utilization. The agent auto-detects your job package and begins tracking without any code changes.

Supported Packages

Package	Identifier	Storage	Auto-Detected
msavin:sjobs (Steve Jobs)	`msavin:sjobs`	MongoDB (`jobs_data` collection)	Yes
BullMQ	`bullmq`	Redis	Yes

The agent checks for each package at startup and activates the first one it finds. If your app uses both packages, the agent monitors both and tags each job with its originating package so you can filter and compare them in the dashboard.

msavin:sjobs (Steve Jobs)

Steve Jobs is a MongoDB-backed job queue built for Meteor. The agent monitors it by observing the jobs_data collection for state changes.

Requirements:

msavin:sjobs package installed (meteor add msavin:sjobs)
No additional configuration needed -- the agent detects the global Jobs object automatically

Configuration:

{
  "skysignal": {
    "collectJobs": true,
    "jobsInterval": 30000
  }
}

Option	Type	Default	Description
`collectJobs`	Boolean	`true`	Enable or disable job monitoring
`jobsInterval`	Integer	`30000`	How often to collect queue statistics (ms)

Steve Jobs uses a single "default" queue. The agent tracks job lifecycle through MongoDB observer callbacks, so there is no polling overhead for individual job events -- only the periodic stats collection runs on the interval.

BullMQ

BullMQ is a Redis-backed job queue. The agent discovers queues by scanning Redis for BullMQ key patterns (bull:*:meta) and attaches QueueEvents listeners for real-time job tracking.

Requirements:

bullmq npm package installed (npm install bullmq)
ioredis npm package installed (npm install ioredis) -- used for queue discovery
A running Redis instance

Configuration:

{
  "skysignal": {
    "collectJobs": true,
    "jobsInterval": 30000,
    "bullmqRedis": {
      "host": "localhost", // Redis host
      "port": 6379 // Redis port
    }
  }
}

Option	Type	Default	Description
`collectJobs`	Boolean	`true`	Enable or disable job monitoring
`jobsInterval`	Integer	`30000`	How often to collect queue statistics (ms)
`bullmqRedis`	Object	`{ host: "localhost", port: 6379 }`	Redis connection for queue discovery
`bullmqQueues`	Array	`[]`	Manually specify queues to monitor (see below)
`detailedTracking`	Boolean	`true`	Fetch full job details on failure for stacktraces
`jobCacheMaxSize`	Integer	`2000`	Max entries in the job detail cache
`jobCacheTTL`	Integer	`120000`	Job cache entry TTL in ms

Manual queue configuration:

If the agent cannot discover your queues via Redis scanning (e.g., non-standard Redis key prefix), you can list them explicitly:

{
  "skysignal": {
    "collectJobs": true,
    "bullmqQueues": [
      {
        "name": "emailQueue",
        "connection": { "host": "redis.example.com", "port": 6379 }
      },
      {
        "name": "reportQueue"
      }
    ]
  }
}

Queues listed in bullmqQueues are monitored immediately on startup. The agent also performs periodic Redis scans to discover new queues that are created after the app starts.

Queue discovery

BullMQ queues are discovered by scanning Redis for keys matching bull:*:meta. If your queues don't appear in the dashboard, make sure the agent's Redis connection matches the one your app uses, or list the queues manually via bullmqQueues.

Forcing a Specific Package

If your app has both packages installed but you only want to monitor one:

{
  "skysignal": {
    "jobsPackage": "bullmq"
  }
}

Option	Type	Default	Description
`jobsPackage`	String	Auto-detect	Force a specific package: `"msavin:sjobs"` or `"bullmq"`

What Gets Tracked

For every job that runs through your queue, the agent captures:

Field	Description
`jobId`	Unique identifier for the job
`jobName`	Job name / type (e.g., `"sendWelcomeEmail"`)
`jobType`	Auto-inferred category (email, report, sync, etc.)
`queueName`	Queue the job belongs to (`"default"` for Steve Jobs, queue name for BullMQ)
`jobsPackage`	Originating package (`"msavin:sjobs"` or `"bullmq"`)
`status`	Current state: `pending`, `running`, `completed`, `failed`, `stalled`, `cancelled`
`queuedAt`	When the job was added to the queue
`startedAt`	When execution began
`completedAt`	When execution finished
`duration`	Execution time in milliseconds
`delay`	Time spent waiting in queue before execution (ms)
`attempts`	Number of execution attempts
`error`	Error details if the job failed (message, stack trace)
`priority`	Job priority level
`progress`	Progress percentage for long-running jobs (0-100)
`host`	Server hostname that processed the job
`originatingTraceId`	Trace ID of the Meteor Method that enqueued the job

Job Status Lifecycle

pending → running → completed
                  → failed
                  → stalled → failed (auto after 30 min)
pending → cancelled

Jobs that remain in running status for more than 30 minutes without a completion event are automatically marked as failed with a StaleJobTimeout error. This handles cases where the monitored app crashes mid-job.

Stalled Job Detection

Both monitors detect stalled jobs, but through different mechanisms:

Steve Jobs: A job is considered stalled when its due time is more than 5 minutes in the past and its state is still pending
BullMQ: Uses BullMQ's built-in stalled event, which fires when a worker fails to renew its lock within the configured stall interval

Trace Correlation

When a Meteor Method enqueues a background job, SkySignal links the job back to the originating method trace. This lets you follow the full request lifecycle from the initial Method call through to job completion.

How it works:

Steve Jobs: The agent wraps Jobs.run() and captures the current method's traceId from AsyncLocalStorage. When the job document appears in MongoDB, the agent matches it to the pending trace context.
BullMQ: The agent wraps Queue.add() and Queue.addBulk() to inject a __skysignal_traceId field into the job data. When the job executes, the agent extracts the trace ID from the job payload.

Trace correlation is automatic -- no code changes required. In the dashboard, jobs with an originating trace show a link to the parent method trace.

Dashboard

Navigate to your site's Jobs tab to see the monitoring data.

Overview

The top of the page shows four summary cards:

Total Jobs -- Count of jobs in the selected time range, with throughput (jobs/min)
Success Rate -- Percentage of jobs that completed successfully
Avg Execution -- Average job duration, plus the longest job in the period
Failed Jobs -- Count of failed and stalled jobs that need attention

Filtering

Two filter dropdowns appear in the tab bar when applicable:

Package filter -- Shown when your app uses multiple job packages. Filter by msavin:sjobs, bullmq, or view all.
Queue filter -- Shown when multiple queues exist. Filter by specific queue or view all.

Both filters apply to all tabs: Running, Failed, Scheduled, Recent Jobs, Performance, and Analytics.

Tabs

Running -- Currently executing jobs with live duration, progress bars, and the ability to cancel.

Failed -- Jobs that failed or stalled, with error details. Stalled jobs are highlighted with a warning badge on the tab.

Scheduled -- Pending jobs waiting to execute, including scheduled future runs and repeat patterns.

Recent Jobs -- Full history of recent job executions. Click a row to expand and see job details, error stack traces, and a link to the originating method trace.

Performance -- Per-job-type performance breakdown with P50/P95/P99 duration percentiles, failure rates, throughput, and trend comparison against the previous period.

Analytics -- Queue depth history over time, failure rate trend charts, and worker utilization gauges per queue.

Best Practices

Monitor failure rates

A failure rate above 5% usually indicates a systemic issue. Use the Performance tab to identify which job types are failing most, then check the Failed tab for error details and stack traces.

Watch queue depth

A growing queue depth means jobs are being enqueued faster than they can be processed. Either increase worker concurrency or optimize job execution time. The Analytics tab shows queue depth over time so you can spot trends.

Use trace correlation

When a job fails, check if it has an originating trace. The parent Method call often provides context about why the job was created with bad data or under unusual conditions.

Set appropriate intervals

The default jobsInterval of 30 seconds works well for most apps. For high-throughput queues (thousands of jobs per minute), consider increasing it to 60 seconds to reduce overhead. Job lifecycle events (start, complete, fail) are always tracked in real-time regardless of the interval setting.

{
  "skysignal": {
    "collectJobs": true,
    "jobsInterval": 60000
  }
}

Troubleshooting

Jobs Not Appearing in Dashboard

Verify collectJobs is true (or not set -- it defaults to true)
Check that your job package is installed and running
Enable debug mode to see agent logs: "debug": true
For BullMQ, verify the Redis connection matches your app's Redis

BullMQ Queues Not Discovered

Check that ioredis is installed (npm ls ioredis)
Verify the bullmqRedis connection settings match your Redis instance
Try listing queues manually via bullmqQueues
Check Redis for BullMQ keys: redis-cli KEYS "bull:*:meta"

Steve Jobs Observer Not Starting

Verify the Jobs global is available (typeof Jobs in meteor shell)
Check that Jobs.collection exists
The observer requires server-side Meteor reactivity -- make sure you are not running in a non-standard Meteor environment

High Memory Usage

If you have a very high-throughput BullMQ setup, the job detail cache can grow. Reduce jobCacheMaxSize or jobCacheTTL:

{
  "skysignal": {
    "jobCacheMaxSize": 500,
    "jobCacheTTL": 60000
  }
}

Stale Jobs Accumulating

Jobs stuck in running for over 30 minutes are auto-marked as failed. If you see many stale jobs, it usually means:

Your app crashed or restarted while jobs were running
Job execution exceeds 30 minutes (consider splitting long jobs into smaller steps)
The agent lost connection to the monitored app

Next Steps

Configuration - Full configuration reference
Method Tracing - Understand trace correlation with jobs
Error Tracking - Track errors across methods and jobs
Performance Optimization - Optimize job throughput

Supported Packages​

msavin:sjobs (Steve Jobs)​

BullMQ​

Forcing a Specific Package​

What Gets Tracked​

Job Status Lifecycle​

Stalled Job Detection​

Trace Correlation​

Dashboard​

Overview​

Filtering​

Tabs​

Best Practices​

Monitor failure rates​

Watch queue depth​

Use trace correlation​

Set appropriate intervals​

Troubleshooting​

Jobs Not Appearing in Dashboard​

BullMQ Queues Not Discovered​

Steve Jobs Observer Not Starting​

High Memory Usage​

Stale Jobs Accumulating​

Next Steps​