You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hybrid SSM models carry recurrent state beyond the KV cache, so restoring only blocks corrupts their output; APC already excludes them. Multimodal requests currently opt out of adopt/donate. Both behaviors must be preserved under the unified store.
Tasks
Keep the SSM/recurrent family carve-out excluded from block sharing and verify the unified path falls back to no-share for them. Affected model_type strings: jamba, mamba, mamba2, nemotron_h, gated_delta, kimi_linear, qwen3_next, falcon_mamba, longcat_flash, rwkv7, recurrent_gemma.
Decide and implement the VLM policy: either keep VLM cold-prefill, or fold mm_digest into the block-match key so image/audio prefixes can share blocks without cross-modal collisions.
Add regression coverage that an SSM model under the paged backend produces unchanged output.
Acceptance criteria
SSM/hybrid models produce output identical to the pre-epic behavior.
The VLM policy is documented and enforced; no incorrect cross-modal block reuse is possible.
Context
Hybrid SSM models carry recurrent state beyond the KV cache, so restoring only blocks corrupts their output; APC already excludes them. Multimodal requests currently opt out of adopt/donate. Both behaviors must be preserved under the unified store.
Tasks
model_typestrings:jamba,mamba,mamba2,nemotron_h,gated_delta,kimi_linear,qwen3_next,falcon_mamba,longcat_flash,rwkv7,recurrent_gemma.mm_digestinto the block-match key so image/audio prefixes can share blocks without cross-modal collisions.Acceptance criteria
Dependencies
Blocked by Phase 4 (unification).
Part of #116