-
Notifications
You must be signed in to change notification settings - Fork 15
Phase 5: Block-budget admission, eviction, and preemption #122
Copy link
Copy link
Open
Labels
area:coremlxcel-core: MLX FFI, primitives, KV cache, layersmlxcel-core: MLX FFI, primitives, KV cache, layersarea:inferenceGeneration, sampling, decoding (incl. speculative, DRY)Generation, sampling, decoding (incl. speculative, DRY)status:backlogIn the backlog, not yet readyIn the backlog, not yet readytype:enhancementNew features, capabilities, or significant additionsNew features, capabilities, or significant additions
Milestone
Metadata
Metadata
Assignees
Labels
area:coremlxcel-core: MLX FFI, primitives, KV cache, layersmlxcel-core: MLX FFI, primitives, KV cache, layersarea:inferenceGeneration, sampling, decoding (incl. speculative, DRY)Generation, sampling, decoding (incl. speculative, DRY)status:backlogIn the backlog, not yet readyIn the backlog, not yet readytype:enhancementNew features, capabilities, or significant additionsNew features, capabilities, or significant additions
Type
Fields
Give feedbackNo fields configured for issues without a type.
Context
With physical blocks shared across requests, scheduling and eviction must be tied to the global block budget rather than per-sequence buffers. Freed prefix blocks must return to the pool when no sequence references them.
Tasks
PreemptionPolicy.prompt_cache/policy.rs,store.rs).GET /v1/cache/statsand the Prometheus gauges (extendprompt_cache/metrics.rs).Acceptance criteria
Dependencies
Blocked by Phase 4 (radix-trie / block-pool unification).
Part of #116