Skip to content

Hybrid SSM exclusion and multimodal adopt policy under the unified cache #124

@inureyes

Description

@inureyes

Context

Hybrid SSM models carry recurrent state beyond the KV cache, so restoring only blocks corrupts their output; APC already excludes them. Multimodal requests currently opt out of adopt/donate. Both behaviors must be preserved under the unified store.

Tasks

  • Keep the SSM/recurrent family carve-out excluded from block sharing and verify the unified path falls back to no-share for them. Affected model_type strings: jamba, mamba, mamba2, nemotron_h, gated_delta, kimi_linear, qwen3_next, falcon_mamba, longcat_flash, rwkv7, recurrent_gemma.
  • Decide and implement the VLM policy: either keep VLM cold-prefill, or fold mm_digest into the block-match key so image/audio prefixes can share blocks without cross-modal collisions.
  • Add regression coverage that an SSM model under the paged backend produces unchanged output.

Acceptance criteria

  • SSM/hybrid models produce output identical to the pre-epic behavior.
  • The VLM policy is documented and enforced; no incorrect cross-modal block reuse is possible.

Dependencies

Blocked by Phase 4 (unification).

Part of #116

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:coremlxcel-core: MLX FFI, primitives, KV cache, layersarea:modelsModel architectures, weights, loading, metadatastatus:backlogIn the backlog, not yet readytype:enhancementNew features, capabilities, or significant additions

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions