-
-
Notifications
You must be signed in to change notification settings - Fork 972
Description
Move QoS Guard Functionality from Celery to Kombu
Background
This issue is related to Celery PR #9863 which implements a "pragmatic Celery‑side QoS guard" to prevent worker stalls and infinite loops. The PR author notes:
This is a pragmatic Celery‑side QoS guard. Longer‑term, the ideal place for this behavior would be in Kombu's QoS, with Celery surfacing the option.
Problem Statement
Celery currently implements QoS guard logic to prevent critical issues with message consumption, but this functionality belongs in Kombu's QoS layer for better architecture and reusability.
Issues the QoS Guard Addresses:
- Worker Stalls: When prefetch is disabled or misconfigured, workers can stop processing messages entirely
- Infinite Loops: Certain transport/prefetch combinations cause
qos.can_consume()to behave incorrectly, leading to 100% CPU usage - Transport Incompatibilities: Some transports (like SQS) don't support traditional AMQP prefetch semantics properly
- Head-of-Line Blocking: Long-running tasks can cause starvation when prefetch holds tasks in reserve while workers sit idle
Current State
Celery PR #9863 implements a QoS guard in Celery that wraps channel.qos.can_consume to check reserved_requests against effective concurrency. This prevents workers from fetching new tasks when all execution slots are busy, solving head-of-line blocking issues.
Proposed Solution
Move this QoS guard functionality from Celery to Kombu's QoS implementation, where it architecturally belongs. This would:
- Centralize QoS Logic: Put transport-aware QoS behavior in the transport layer
- Enable Reuse: Other Kombu consumers besides Celery can benefit from the guard
- Improve Maintainability: Reduce code duplication between projects
- Better Abstraction: Allow Celery to simply surface configuration options rather than implement guard logic
Implementation Approach
- Add configurable QoS guard functionality to Kombu's existing QoS classes
- Provide transport options to enable/configure the guard behavior
- Ensure backward compatibility (disabled by default)
- Allow Celery to migrate from its internal guard to Kombu's implementation
Benefits
- Better Architecture: QoS logic belongs in the transport layer, not the application layer
- Code Reuse: Other Kombu-based applications can benefit from the guard functionality
- Reduced Complexity: Celery can focus on task execution rather than transport-level QoS issues
- Improved Reliability: Centralized QoS handling reduces bugs and edge cases
Related Issues
- Celery PR #9863: Current Celery-side QoS guard implementation
- Kombu Issue #685: SQS prefetch infinite loop issues
- Celery Issue #6067: Per-consumer QoS workarounds
Acceptance Criteria
- QoS guard functionality is available in Kombu's QoS classes
- Guard behavior is configurable via transport options
- Existing Kombu/Celery code continues to work unchanged (backward compatible)
- Clear documentation on how to enable and configure the guard
- Celery can eventually migrate from its internal guard to Kombu's implementation