fix(rocm): propagate hw_kernel_config to qwen35 layers by chengshu-lcc · Pull Request #970 · alibaba/rtp-llm

chengshu-lcc · 2026-05-07T03:48:18Z

概述

修复 ROCm 上 Qwen3-Next 在 USE_SWIZZLEA=1 时无法走通 swizzled-A hipBLASLt 路径导致精度发散的问题。

在 RocmImpl 的 swizzle-A 权重列表中补上 W.linear_attn_ba_w，让 in_proj_ba 与其他 linear attention 权重一起完成预 swizzle。
将 hw_kernel_config 从 Qwen3NextModel 一路透传到 Qwen3NextDecoderLayer → Qwen3NextAttention / Qwen3NextGatedDeltaNet / DenseMLP，以及 GenericMoeLayer 的 shared expert，使 LinearFactory.create_linear_from_weights 能正确选中 swizzled-A kernel。

LLLLKKKK · 2026-05-07T04:26:14Z

AI Code Review - PR #970

Status: BLOCKING

Summary: P0/0 · P1/1 · P2/0 · P3/0

Blocking Issues

P1

Qwen3-Next MTP 未透传 hw_kernel_config @ rtp_llm/models_py/model_desc/qwen3_next.py:775
- 建议：构造 Qwen3NextMTPModel 的 Qwen3NextDecoderLayer 时同步传入 py_hw_kernel_config，并补充 MTP + use_swizzleA 覆盖。

Checklist Violations (4 fail / 63 total)

General Principles Checklist

[6.1] Architecture — 兼容性：公开 API/持久数据/配置/环境迁移安全 → issue Qwen3-Next MTP 未透传 hw_kernel_config
Qwen3NextDecoderLayer 新增 hw_kernel_config 传播要求，但 Qwen3NextMTPModel caller 未同步传入，配置传播未覆盖所有模型变体。
[6.1] Tests — 新逻辑有聚焦单测 + 相关集成/smoke 测试 → issue Qwen3-Next MTP 未透传 hw_kernel_config
缺少 MTP + ROCm use_swizzleA 场景覆盖，未能约束新增 hw_kernel_config 透传必须覆盖 Qwen3NextMTPModel。
[6.1] Tests — 分布式/跨平台变更有对应覆盖 → issue Qwen3-Next MTP 未透传 hw_kernel_config
改动面向 ROCm use_swizzleA 的 Linear 选择，但未看到 MTP 变体的跨平台覆盖，已与配置漏传问题合并处理。

RTP-LLM Checklist

[A] 兼容性与配置 — 新增模型配置字段传播至所有模型变体 → issue Qwen3-Next MTP 未透传 hw_kernel_config
主 Qwen3NextModel 传入 py_hw_kernel_config，但 Qwen3NextMTPModel 仍用旧参数构造 Qwen3NextDecoderLayer，MTP 变体漏传。

Strengths

主 Qwen3NextModel 路径已把 py_hw_kernel_config 继续传入 attention、linear attention、DenseMLP 和 GenericMoeLayer，改动范围集中。

LLLLKKKK · 2026-05-07T06:33:33Z

AI Code Review - PR #970

Status: LGTM

Summary: P0/0 · P1/0 · P2/1 · P3/0

lgtm ready to ci

Non-blocking Suggestions

P2

新增类型标注缺少 HWKernelConfig 导入 @ rtp_llm/models_py/model_desc/qwen3_next.py:461
- 建议：在 rtp_llm.ops import 列表中补充 HWKernelConfig，保持静态类型检查和同类文件一致。

Checklist ✅ (74 items passed)

Strengths

主模型和 MTP 路径都已把 py_hw_kernel_config 继续传入 Qwen3-Next decoder layer，前一轮指出的 MTP 漏传风险已覆盖。
linear_attn_ba_w 的权重改写与 in_proj_ba 的 LinearFactory 配置同步更新，swizzleA 路径整体更一致。

chengshu-lcc requested a review from LLLLKKKK as a code owner May 7, 2026 03:48

fix USE_SWIZZLEA=1 bug

4ccec20

chengshu-lcc force-pushed the feature/enable_hipbmm branch from a98afd6 to 4ccec20 Compare May 7, 2026 06:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(rocm): propagate hw_kernel_config to qwen35 layers#970

fix(rocm): propagate hw_kernel_config to qwen35 layers#970
chengshu-lcc wants to merge 1 commit intoalibaba:mainfrom
chengshu-lcc:feature/enable_hipbmm

chengshu-lcc commented May 7, 2026

Uh oh!

LLLLKKKK commented May 7, 2026

Uh oh!

LLLLKKKK commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chengshu-lcc commented May 7, 2026

概述

Uh oh!

LLLLKKKK commented May 7, 2026

AI Code Review - PR #970

Blocking Issues

P1

Checklist Violations (4 fail / 63 total)

Strengths

Uh oh!

LLLLKKKK commented May 7, 2026

AI Code Review - PR #970

Non-blocking Suggestions

P2

Checklist ✅ (74 items passed)

Strengths

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants