-
Notifications
You must be signed in to change notification settings - Fork 16
Phase 3: Paged prefill into the block pool #120
Copy link
Copy link
Closed
Labels
area:coremlxcel-core: MLX FFI, primitives, KV cache, layersmlxcel-core: MLX FFI, primitives, KV cache, layersarea:inferenceGeneration, sampling, decoding (incl. speculative, DRY)Generation, sampling, decoding (incl. speculative, DRY)status:doneCompletedCompletedtype:enhancementNew features, capabilities, or significant additionsNew features, capabilities, or significant additions
Milestone
Metadata
Metadata
Assignees
Labels
area:coremlxcel-core: MLX FFI, primitives, KV cache, layersmlxcel-core: MLX FFI, primitives, KV cache, layersarea:inferenceGeneration, sampling, decoding (incl. speculative, DRY)Generation, sampling, decoding (incl. speculative, DRY)status:doneCompletedCompletedtype:enhancementNew features, capabilities, or significant additionsNew features, capabilities, or significant additions
Type
Fields
Give feedbackNo fields configured for issues without a type.
Context
Prefill currently writes into dense buffers and the paged path is decode-only (
is_paged_decoderequiresq_len == 1). For an end-to-end paged sequence, prefill must allocate and write pool blocks, and it must be able to start after a shared prefix.Tasks
PagedBlockPool::write_prefill)copy_on_write_block)Acceptance criteria
PagedCacheStats).Dependencies
Blocked by Phase 1 and Phase 2.
Part of #116