Skip to content

fix(rocm): apply RoPE for embedding models without KV cache#973

Open
siluzhou wants to merge 1 commit intomainfrom
develop/tbstars_225
Open

fix(rocm): apply RoPE for embedding models without KV cache#973
siluzhou wants to merge 1 commit intomainfrom
develop/tbstars_225

Conversation

@siluzhou
Copy link
Copy Markdown
Collaborator

@siluzhou siluzhou commented May 7, 2026

When kv_cache is None (embedding models), AiterPrefillImplAsm and AiterPrefillImplNonAsm skipped RoPE and went straight to _forward_varlen. This caused severe output divergence for models that require positional encoding (e.g. tbstars with rope_theta=10000).

Also update _forward_varlen to handle the (q, k_padded, v_padded) tuple returned by the RoPE C++ op via unpad_kv_vectorized.

@siluzhou siluzhou requested a review from LLLLKKKK as a code owner May 7, 2026 11:17
@LLLLKKKK
Copy link
Copy Markdown
Collaborator

LLLLKKKK commented May 7, 2026

AI Code Review - PR #973

Status: LGTM

Summary: P0/0 · P1/0 · P2/1 · P3/0

lgtm ready to ci

Non-blocking Suggestions

P2

  • 缺少 no-KV RoPE 路径的回归测试 @ rtp_llm/models_py/modules/factory/attention/rocm_impl/aiter.py:916
    • 建议:补一个 ROCm 单测或 smoke,构造 need_rope_kv_cache=Truekv_cache=None 的 AiterPrefillImplAsm/NonAsm 路径,并校验输出与带 RoPE 的参考实现一致。

Checklist Violations (2 fail / 97 total)

General Principles Checklist

  • [6.1] Tests — 新逻辑有聚焦单测 + 相关集成/smoke 测试 → issue 缺少 no-KV RoPE 路径的回归测试
    新增 no-KV RoPE wrapper 路径和 tuple varlen 输入处理,但 diff 未新增覆盖 kv_cache=None + need_rope_kv_cache=True 的测试。

RTP-LLM Checklist

  • [H] 测试与 CI — 测试覆盖充分:大重构等价覆盖,新功能端到端测试 → issue 缺少 no-KV RoPE 路径的回归测试
    新增 no-KV RoPE 修复没有随 diff 增加对应 ROCm 单测或 smoke 覆盖。

Strengths

  • 修复点同时覆盖 ASM 与 NonAsm 的 prefill no-KV 分支,避免两个 ROCm 后端行为分叉。
  • _forward_varlen 复用现有 unpad_kv_vectorized,没有重新引入逐 batch Python unpad 循环。

When kv_cache is None (embedding models), AiterPrefillImplAsm and
AiterPrefillImplNonAsm skipped RoPE and went straight to _forward_varlen.
This caused severe output divergence for models that require positional
encoding (e.g. tbstars with rope_theta=10000).

Also update _forward_varlen to handle the (q, k_padded, v_padded) tuple
returned by the RoPE C++ op via unpad_kv_vectorized.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
@siluzhou siluzhou force-pushed the develop/tbstars_225 branch from 91cd1f9 to 049caba Compare May 7, 2026 13:56
@LLLLKKKK
Copy link
Copy Markdown
Collaborator

LLLLKKKK commented May 7, 2026

AI Code Review - PR #973

Status: LGTM

Summary: P0/0 · P1/0 · P2/0 · P3/0

lgtm ready to ci

Checklist ✅ (81 items passed)

Strengths

  • ASM 与 NonASM 的 no-KV prefill 分支现在都会先执行 RoPE,再进入 FMHA,修复范围一致。
  • 新增 wrapper 回归测试覆盖了 need_rope_kv_cache=Truekv_cache=None 时 RoPE 输出被传给 FMHA 的调用顺序。
  • _forward_varlen 复用已有 unpad_kv_vectorized,没有重新引入逐 batch Python unpad 循环。

@wht21
Copy link
Copy Markdown
Collaborator

wht21 commented May 7, 2026

internal source has been updated, please review the changes!

@wht21
Copy link
Copy Markdown
Collaborator

wht21 commented May 7, 2026

internal source has been updated, please review the changes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants