Skip to content

End-to-end validation, parity, benchmarks, and docs #126

@inureyes

Description

@inureyes

Context

Final integration checkpoint that proves the unified store is correct, measures its payoff on Apple Silicon, and brings the docs in line.

Phase 0 note (ADR 0001, #117): the #117 microbench (examples/page_gather_microbench.rs) measures the attention sub-step only, so its overhead figures (~15-67% single-sequence, higher under batching) dilute at the model level where attention is a fraction of each decode token. End-to-end benchmarks here must measure model-level decode throughput (examples/profile_paged_decode_kernel.rs gives the paged-vs-dense model-level comparison) and must include concurrent / batched-decode scenarios, since Phase 0 found batch amplifies the gather cost far more than context length. See docs/adr/0001-paged-attention-gather-vs-fused-kernel.md.

Tasks

  • Correctness tests: concurrent shared-prefix requests, COW divergence, eviction under block pressure, fragmented block tables, and trim/rewind across shared blocks.
  • Parity vs the dense backend, and vs mlx-lm where applicable.
  • Benchmarks: memory saved per concurrent shared prefix, decode throughput delta vs the dense backend, and prefill tokens avoided. Update docs/model_tests*.md.
  • Docs: add a unified-cache section to docs/turbo-kv-cache.md and docs/en/prompt_cache.md; update docs/CONTINUOUS_BATCHING.md; remove the "Paged backend adopt/donate disabled" note from the prompt-cache Limitations section.

Acceptance criteria

  • All correctness tests pass on Apple Silicon.
  • Benchmark numbers and docs merged.
  • The prompt-cache "Limitations" note about paged adopt is removed because it no longer holds.

Dependencies

Blocked by Phases 4, 5, 6, and the coverage/distributed sub-issues.

Part of #116

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:docsUser and developer documentationstatus:backlogIn the backlog, not yet readytype:testTest related changes

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions