Skip to content

Conversation

@LucasWilkinson
Copy link
Collaborator

@LucasWilkinson LucasWilkinson commented Dec 7, 2025

Fix

ValueError: Selected backend AttentionBackendEnum.CUTLASS_MLA is not valid for this configuration. Reason: ['kv_cache_dtype not supported', 'sparse not supported']

When running

vllm serve deepseek-ai/DeepSeek-V3.2 

on 8xB200

Signed-off-by: Lucas Wilkinson <[email protected]>
@LucasWilkinson LucasWilkinson requested a review from mgoin December 7, 2025 05:47
@mergify mergify bot added deepseek Related to DeepSeek models nvidia labels Dec 7, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The code was updated to prevent forcing CutlassMLA for Blackwell GPUs when sparse attention (specifically DSv3.2) is enabled, by adding a not use_sparse condition to the relevant configuration check.

Copy link
Collaborator

@houseroad houseroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix.

@github-project-automation github-project-automation bot moved this to In review in NVIDIA Dec 7, 2025
@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 7, 2025
@LucasWilkinson LucasWilkinson merged commit 0044c40 into vllm-project:main Dec 7, 2025
49 of 51 checks passed
@github-project-automation github-project-automation bot moved this from In review to Done in NVIDIA Dec 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepseek Related to DeepSeek models nvidia ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants