-
Notifications
You must be signed in to change notification settings - Fork 15
Phase 2: Paged decode attention over real block tables #119
Copy link
Copy link
Open
Labels
area:coremlxcel-core: MLX FFI, primitives, KV cache, layersmlxcel-core: MLX FFI, primitives, KV cache, layersarea:inferenceGeneration, sampling, decoding (incl. speculative, DRY)Generation, sampling, decoding (incl. speculative, DRY)platform:macosmacOS (Apple Silicon) specificmacOS (Apple Silicon) specificstatus:backlogIn the backlog, not yet readyIn the backlog, not yet readytype:enhancementNew features, capabilities, or significant additionsNew features, capabilities, or significant additions
Milestone
Metadata
Metadata
Assignees
Labels
area:coremlxcel-core: MLX FFI, primitives, KV cache, layersmlxcel-core: MLX FFI, primitives, KV cache, layersarea:inferenceGeneration, sampling, decoding (incl. speculative, DRY)Generation, sampling, decoding (incl. speculative, DRY)platform:macosmacOS (Apple Silicon) specificmacOS (Apple Silicon) specificstatus:backlogIn the backlog, not yet readyIn the backlog, not yet readytype:enhancementNew features, capabilities, or significant additionsNew features, capabilities, or significant additions
Type
Fields
Give feedbackNo fields configured for issues without a type.
Context
The decode kernel currently slices per-sequence dense buffers using an identity block table built by
PagedDecodeMetadata::from_visible_lengths([0, 1, 2, ...]). With real pool storage in place, attention must gather the sequence's actual, possibly non-contiguous, physical blocks.Tasks
PagedSequenceStateblock tables in the decode dispatch (src/models/model_owned.rsand the per-model paths inqwen3.rs/llama3.rs/gemma3.rs/llama4.rs).paged_decode_attention_dense_compat(C++ incpp/mlx_cxx_bridge.cpp) and the Rust fallbackpaged_decode_attention_dense_fallback(layers.rs) to gather by physical block id from the pool (Phase 0 option A).paged_decode_attention_rotating_compat.Acceptance criteria
Dependencies
Blocked by Phase 1 (global block-pool tensor storage).
Part of #116