Skip to content

Jobs got stuck - need advice #2797

@antepetkovic0

Description

@antepetkovic0

Hi guys,

We’re experiencing issues with Redis memory usage and job states in our queue system after enabling automatic scaling on our API services.
Our services automatically scale up and down based on the number of incoming requests. Each API instance creates and connects to a queue to process incoming jobs. However, we haven’t implemented a graceful shutdown process — meaning queues aren’t explicitly closed when a service shuts down (queue.close()).

During scale-up events, we noticed Redis memory usage steadily increased but never returned to its original level. Although we remove both completed and failed jobs, memory usage kept growing. Eventually, Redis reached 100% memory utilization, causing downtime because no new jobs could be processed.

After increasing the Redis memory limit and disabling auto-scaling, everything stabilized. However, we now have around 4 million jobs in a “stuck” state (job.getState() === "stuck").

Questions

  • Is there any way to move these "stuck" jobs back to the "waiting" or "active" state?
  • If not, what’s the best way to safely delete them — should we filter by timestamp, or by checking job.getState() === "stuck" (documentation advise this is not performance wise)?
  • Could this issue be caused by improper scale-down behaviour (e.g., shutting down services without properly closing their queues)?

Additional Notes

  • All those stuck jobs are rate-limited, so we assume the 4M jobs were added to the queue before the memory limit was reached and were simply waiting to be processed
  • We have another queues that are not rate-limited and all their jobs were processed

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions