feat(deps): unify pip deps via PEP 503 indexes + thin requirements by LLLLKKKK · Pull Request #962 · alibaba/rtp-llm

LLLLKKKK · 2026-04-30T12:53:40Z

Summary

Switches the build to a single set of PEP 503 indexes so the same wheel filenames and versions resolve cleanly across all platforms.

pip.bzl: thin pip-tools index list (download.pytorch.org per-CUDA, aliyun PyPI mirror, plus a public OSS bucket for the custom flash_attn / deep_ep / deep_gemm / flashinfer / rtp_kernel wheels we publish)
requirements_*.txt: thin per-platform inputs that share requirements_base.txt
lockfiles regenerated against the unified indexes

Test plan

cuda12_9 bazel build passes against the new lockfile
CI on this PR

Copilot

Pull request overview

Unifies Python dependency resolution by switching from direct wheel URLs to package/version pins intended to be served via PEP 503 “simple” indexes, and wires Bazel pip parsing to use a shared index list across platforms.

Changes:

Replaced many direct wheel URLs in per-platform deps/requirements_*.txt with pinned package names/versions.
Expanded deps/pip.bzl PIP_EXTRA_ARGS to include the unified set of extra PEP 503 indexes (rtp-opensource simple + PyTorch per-ABI indexes + mirror).
Added a stable WORKSPACE repository alias (rtp_opensource_deps) intended to keep a handle to opensource deps/ when rtp_deps is overridden internally.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
deps/requirements_torch_gpu_cuda12_9.txt	Switches CUDA 12.9 GPU deps from direct URLs to pinned names/versions.
deps/requirements_torch_gpu_cuda12.txt	Switches CUDA 12.6 GPU deps from direct URLs to pinned names/versions.
deps/requirements_torch_cpu.txt	Updates CPU Torch pin and removes direct wheel URLs.
deps/requirements_rocm.txt	Switches ROCm deps to pinned names/versions and adds `amd-smi`.
deps/requirements_cuda12_arm.txt	Switches CUDA12 ARM deps from direct URLs to pinned names/versions.
deps/requirements_cpu_arm.txt	Replaces direct ARM CPU Torch wheel URL with a version pin.
deps/pip.bzl	Centralizes pip index configuration via a shared extra-index list.
deps/http.bzl	Adds TODO commentary about consolidating duplicated `http_archive` entries.
WORKSPACE	Adds `rtp_opensource_deps` local_repository alias for internal override scenarios.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+# Consolidate here as multi-URL lists (artlab mirror first, public URL fallback);
+# requires per-config `bazel build` verification before landing to catch URL
+# availability regressions.
+# See /home/liukan.lk/.claude/plans/serialized-wibbling-snail.md Phase 5.


LLLLKKKK · 2026-04-30T14:09:42Z

AI Code Review - PR #962

Status: BLOCKING

Summary: P0/0 · P1/3 · P2/4 · P3/0

Blocking Issues

P1

deps/http.bzl 注释泄漏内源个人开发机绝对路径与 Claude plan 文件名 @ deps/http.bzl:11
- 建议：删除该行，或仅保留中性描述（如 # See internal_source pip_unify Phase 5 plan），把 plan 实体放到 internal_source/ 文档里
lockfile 与新 requirements 不一致；多个平台 pip_parse 仍解析旧版本 @ deps/requirements_torch_gpu_cuda12.txt:9
- 建议：对所有受影响平台重新生成 lockfile（cuda12/cuda12_arm/rocm/torch_cpu/torch_arm）并提交；至少在本地跑通各平台 bazel build //... 解析阶段，验证 lockfile 与 input 一致
cuda12 requirements 移除 nvidia-nvshmem-cu12 但 deep-ep 仍依赖 NVSHMEM 运行时 @ deps/requirements_torch_gpu_cuda12.txt:6
- 建议：在 deep-ep 1.2.1.10 wheel 上验证是否自带 libnvshmem；如未自带，恢复 nvidia-nvshmem-cu12==3.4.5 显式依赖并补 cuda12 平台 MoE smoke 验证

Non-blocking Suggestions

P2

cuda12_arm torchvision==0.24.0 未带 local label，可能解析到 CPU/错误平台 wheel @ deps/requirements_cuda12_arm.txt:7
- 建议：改写为 torchvision==0.24.0+cu129 或保留显式 wheel URL；lockfile regen 后 grep 确认 wheel URL 包含 cu129 与 aarch64
cuda12 flashinfer-python 由 0.2.5 升至 0.6.0 缺少 caller 侧验证 @ deps/requirements_torch_gpu_cuda12.txt:9
- 建议：在 cuda12 smoke 套件（mha/mla/MoE）跑通后再合入；或在 PR description 列明已确认的 caller-side 适配状态
input requirements 失去显式版本 pin，未来 lockfile regen 可能引入意外升级 @ deps/requirements_torch_gpu_cuda12.txt:2
- 建议：在 input requirements 保留下限/精确 pin（如 autoawq>=0.2.9、apache-tvm-ffi==0.1.1），与 lockfile 共同形成双层防线
PR test plan 仅覆盖 cuda12_9，其余 4 个平台 build 未本地验证 @ deps/pip.bzl:10
- 建议：合入前至少补跑 cuda12/rocm/cpu_arm bazel build；或在 PR description 明确接受由 CI 完整验证再 merge

Checklist Violations (5 fail / 23 total)

General Principles Checklist

[6.1] Architecture — 分层边界：新概念在正确层级，不泄漏内部 → issue deps/http.bzl 注释泄漏内源个人开发机绝对路径与 Claude plan 文件名
deps/http.bzl:11 注释包含一个本地家目录下 .claude/plans/ 个人 plan markdown 文件名，把内源/个人开发环境路径泄漏到 opensource deps/，违反 layering 边界
[6.1] Architecture — 兼容性：公开 API/持久数据/配置/环境迁移安全 → issue lockfile 与新 requirements 不一致；多个平台 pip_parse 仍解析旧版本
requirements 升级了 deep-ep/flashinfer-python 等多个核心 wheel 同时删除 nvidia-nvshmem-cu12，但 lockfile 未同步重生；下游 caller 适配未验证
[6.1] Tests — 新逻辑有聚焦单测 + 相关集成/smoke 测试 → issue PR test plan 仅覆盖 cuda12_9，其余 4 个平台 build 未本地验证
PR test plan 仅勾选 cuda12_9 bazel build，cuda12/cuda12_arm/rocm/cpu/cpu_arm 五个 pip_parse 路径均未本地验证；CI box 未勾
[6.1] Tests — 分布式/跨平台变更有对应覆盖 → issue PR test plan 仅覆盖 cuda12_9，其余 4 个平台 build 未本地验证
diff 同时影响 5 个平台 lockfile 解析行为，但 test plan 只覆盖 cuda12_9 一个平台 build
[6.1] Quality — Commit 原子、message 与行为匹配 → issue lockfile 与新 requirements 不一致；多个平台 pip_parse 仍解析旧版本
commit message 写 lockfiles regenerated against the unified indexes，但实际 diff 未包含任何 requirements_lock*.txt 文件，message 与行为不符_

Strengths

pip.bzl 顶部新增 index 来源注释，把 download.pytorch.org / rtp-opensource / aliyun 三类源的角色与覆盖范围说清，便于后续维护
WORKSPACE 新增 rtp_opensource_deps local_repository 时附注释，解释它与 rtp_deps + .internal_bazelrc --override_repository 的关系，避免后续误改
把多平台 requirements 从直链 wheel URL 切换成 PEP 503 name==version+local 形式，降低 wheel URL 变更带来的维护成本

LLLLKKKK · 2026-05-01T02:47:05Z

AI Code Review - PR #962

Status: BLOCKING

Summary: P0/0 · P1/1 · P2/0 · P3/1

Blocking Issues

P1

重新生成的 lockfile 中泄漏开发者绝对路径 @ deps/requirements_lock_torch_gpu_cuda12_9.txt:143
- 建议：重新执行 update_pip.sh 时确保 cwd 为 github-opensource/ 或对 uv pip compile 显式传入 --directory / 相对源文件路径，使生成的 # via -r ... 注释为相对路径；落库前 grep home/ 校验。否则在另一台机器上跑 bazel run //deps:requirements_*.update 会得到不同的 diff，破坏可重现性，并将开发者用户名（可识别员工身份）固化进开源 lockfile。

Non-blocking Suggestions

P3

apache-tvm-ffi 在不同 lockfile 间版本漂移 @ deps/requirements_lock_cuda12_arm.txt:159
- 建议：统一在 requirements_base.txt 或对应 thin requirements 中显式 pin apache-tvm-ffi 版本，避免不同平台 lockfile 因 resolver 选择差异引入版本漂移；若刻意保留差异请补注释。

Checklist Violations (1 fail / 23 total)

General Principles Checklist

[6.1] Quality — 逻辑变更未混入无关格式化 → issue 重新生成的 lockfile 中泄漏开发者绝对路径
4 个 lockfile 中 # via 注释从 requirements_base.txt 等仓库相对路径变成了带开发者用户名的绝对路径片段（约 262 处），属于无关噪声混入版本化产物，且会随生成机器变化而 diff，破坏可重现性。

Strengths

pip.bzl/PIP_EXTRA_ARGS 顶部新增的 strict-separation 注释清晰说明了 --index-url vs --extra-index-url 的语义、artlab 不可外露的硬约束、SJTU mirror 的 301→403 兼容性陷阱以及为何选择 download.pytorch.org，便于后续维护者判断添加新 index 的影响。
BUILD 文件保留 using_arm/using_cpu 作为 label-only config_setting 时附带的注释说明了 select() 分支仍然引用这些 label 但永远不再 match，避免后续维护者误删导致下游 BUILD 出现未解析 label。
WORKSPACE 引入 rtp_opensource_deps 作为稳定句柄，并解释其与 rtp_deps（被 internal_source 通过 --override_repository 替换）的关系，让 internal overlay 仍能 -r 包含开源 requirements 与共享 http_archive。
commit message 完整记录了被丢弃的 SJTU mirror 实验、CPU/ARM build 路径删除清单及对应清理点，方便后续 phase 5（http_archive 多 URL 合并）回溯。

Copilot

Pull request overview

This PR updates the Bazel Python dependency flow to resolve wheels consistently across platforms by switching to a unified set of PEP 503 indexes (Aliyun PyPI mirror + download.pytorch.org per-accelerator + an OSS “simple/” index for custom wheels), and regenerates lockfiles accordingly.

Changes:

Reworked deps/pip.bzl to use a consolidated index list and simplified (thin) requirements inputs, with lockfiles regenerated by uv.
Removed CPU-only and ARM-CPU pip/lock flows and Bazel configs, and adjusted default dependency selection to fall through to CUDA 12.9.
Regenerated CUDA12.9 / ROCm / CUDA12 ARM lockfiles to match the unified index configuration and new “thin requirements” approach.

Reviewed changes

Copilot reviewed 16 out of 19 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
deps/requirements_torch_gpu_cuda12_9.txt	Converts CUDA12.9 requirements from direct wheel URLs to pinned packages + PEP 503/simple URLs where needed.
deps/requirements_rocm.txt	Updates ROCm requirements to use unified indexes and thin inputs.
deps/requirements_cuda12_arm.txt	Updates CUDA12 ARM requirements to the new thin input style and unified indexes.
deps/requirements_lock_torch_gpu_cuda12_9.txt	Regenerated CUDA12.9 lockfile via `uv`, embedding unified index configuration.
deps/requirements_lock_rocm.txt	Regenerated ROCm lockfile via `uv`, embedding unified index configuration.
deps/requirements_lock_cuda12_arm.txt	Regenerated CUDA12 ARM lockfile via `uv`, embedding unified index configuration.
deps/pip.bzl	Centralizes pip index config and updates `pip_parse` repos to the new lockfiles.
deps/BUILD	Removes deprecated compile targets and keeps only CUDA12.9 / ROCm / CUDA12 ARM compile targets.
deps/http.bzl	Removes CPU torch archives and adds a TODO about consolidating duplicate http_archives across open/internal.
arch_config/arch_select.bzl	Drops CPU/ARM-CPU branches and changes default selection to CUDA12.9.
WORKSPACE	Adds `rtp_opensource_deps` and removes CPU/ARM-CPU pip repo installs.
BUILD	Notes that `using_arm` / `using_cpu` config_settings are now label-only and never match.
.bazelrc	Removes `--config=cpu` and `--config=arm` build configs.

Comments suppressed due to low confidence (2)

deps/pip.bzl:46

pip_deps() no longer declares a pip_parse repo for CUDA12 (cu126), but arch_config/arch_select.bzl still references @pip_gpu_cuda12_torch for cuda_pre_12_9. This will break --config=cuda12 (and any other path that needs the cu126 Python deps). Restore a pip_gpu_cuda12_torch pip_parse with a cu126 lockfile, or remove/retarget the cuda_pre_12_9 dependency path so it doesn’t reference a missing repo.

def pip_deps():
    pip_parse(
        name = "pip_ppu_torch",
        requirements_lock = "@rtp_deps//:requirements_lock_torch_gpu_cuda12_9.txt",
        python_interpreter = "/opt/conda310/bin/python3",
        extra_pip_args = PIP_EXTRA_ARGS,
        timeout = 3600,
    )

    pip_parse(
        name = "pip_gpu_cuda12_9_torch",
        requirements_lock = "@rtp_deps//:requirements_lock_torch_gpu_cuda12_9.txt",
        python_interpreter = "/opt/conda310/bin/python3",
        extra_pip_args = PIP_EXTRA_ARGS,
        timeout = 3600,
        quiet = False,
    )

WORKSPACE:56

WORKSPACE no longer loads/calls pip_gpu_cuda12_torch_install_deps(), but the repo is still referenced from arch_config/arch_select.bzl (@pip_gpu_cuda12_torch). If CUDA12 (cu126) remains supported, pip_deps() and WORKSPACE both need to create/install that repo; if it’s being dropped, please also remove the remaining pip_gpu_cuda12_torch references and associated select branches/configs so workspace loading can’t fail.

load("@rtp_deps//:pip.bzl", "pip_deps")

pip_deps()

load("@pip_ppu_torch//:requirements.bzl", pip_ppu_torch_install_deps = "install_deps")
pip_ppu_torch_install_deps()

load("@pip_gpu_cuda12_9_torch//:requirements.bzl", pip_gpu_cuda12_9_torch_install_deps = "install_deps")
pip_gpu_cuda12_9_torch_install_deps()

load("@pip_cuda12_arm_torch//:requirements.bzl", pip_cuda12_arm_torch_install_deps = "install_deps")
pip_cuda12_arm_torch_install_deps()

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+# availability regressions.
+# See /home/liukan.lk/.claude/plans/serialized-wibbling-snail.md Phase 5.


 # to wrapper target relate with different system config
-load("@pip_cpu_torch//:requirements.bzl", requirement_cpu="requirement")
-load("@pip_arm_torch//:requirements.bzl", requirement_arm="requirement")
 load("@pip_gpu_cuda12_torch//:requirements.bzl", requirement_gpu_cuda12="requirement")


LLLLKKKK · 2026-05-01T08:20:54Z

AI Code Review - PR #962

Status: BLOCKING

Summary: P0/0 · P1/2 · P2/2 · P3/0

Blocking Issues

P1

deps/http.bzl 注释泄漏开发者本地路径与内部 plan 文件名 @ deps/http.bzl:10
- 建议：删除该 See ... 行，或改为指向公开可访问的 issue/PR 编号（如 # See PR #962 / linked tracking issue）。
4 个 lockfile # via 注释包含 github-opensource/deps/ 路径前缀（cwd 错误重生） @ deps/requirements_lock_torch_gpu_cuda12_9.txt:140
- 建议：在 github-opensource/ 子目录内重新生成所有 4 个 lockfile（或在 compile_pip_requirements 的 extra_args 中显式设置工作目录），确保 # via 注解为 -r requirements_base.txt 的相对路径。

Non-blocking Suggestions

P2

pip_ppu_torch 切换至 cuda12_9 lockfile，PPU 平台 torch 版本随之改变 @ deps/pip.bzl:30
- 建议：在合入前于 PPU 平台跑一次 bazel build 与基础 smoke，确认 pip_ppu_torch 切换后所有 requirement(...) 的 transitive 仍可解析、且 PPU 路径未被卷入 cu129 torch 二进制依赖；如 PPU 必须留在 torch 2.6，应在 internal arch_select 中显式区分。
WORKSPACE 新增 rtp_opensource_deps 别名但本 PR 内未见消费方 @ WORKSPACE:21
- 建议：在本 PR 描述/commit message 中链接 internal 侧使用 @rtp_opensource_deps 的具体位置；若 internal 修改尚未提交，建议拆分待 internal 侧 ready 后一次性提交，避免引入暂时不被引用的 repository 别名。

Checklist Violations (4 fail / 23 total)

General Principles Checklist

[6.1] Software Engineering — KISS/YAGNI：无投机性抽象 → issue WORKSPACE 新增 rtp_opensource_deps 别名但本 PR 内未见消费方
WORKSPACE 新增 rtp_opensource_deps local_repository 别名，但本 PR diff 中没有任何 @rtp_opensource_deps//... 引用，注释承诺的 internal overlay 消费方未在本 PR 出现。
[6.1] Architecture — 分层边界：新概念在正确层级，不泄漏内部 → issue deps/http.bzl 注释泄漏开发者本地路径与内部 plan 文件名
deps/http.bzl 新增注释引入开发者本地家目录绝对路径与内部 planning 文件名，作为内部状态泄漏到 opensource 源码注释中。
[6.1] Architecture — 兼容性：公开 API/持久数据/配置/环境迁移安全 → checklist-only
--config=cpu、--config=arm 从 .bazelrc 删除后，仍引用这些 config 的下游用户在 build 时不会立刻报错，会回落到 default（现已切到 cu129）。BUILD 注释说明保留 using_arm 等 label-only config_setting 是有意为之，回退路径有文档化。
[6.1] Tests — 分布式/跨平台变更有对应覆盖 → issue pip_ppu_torch 切换至 cuda12_9 lockfile，PPU 平台 torch 版本随之改变
pip_ppu_torch 切换 lockfile 后 PPU 平台的 torch 解析从 cu126/torch2.6 跳到 cu129/torch2.8，PR 描述未列出 PPU CI 验证；ROCm/cuda12_arm 默认走重生 lockfile 也需对应平台 CI 校验。

Strengths

每处删除/默认值切换都附带清晰的 rationale 注释，明确说明回退指引（如 BUILD 中保留 label-only using_arm 的原因、arch_select.bzl 中 //conditions:default 切换到 cu129 的说明）
PIP_EXTRA_ARGS 注释明确解释 --index-url 必须覆盖容器 env PIP_INDEX_URL（避免 opensource 侧静默走内网 artlab），strict separation 规则写得很清楚
http.bzl 顶部的 TODO 注释列出 Phase 5 待合并的 http_archive 列表与多 URL 回退策略，方便后续 follow-up

LLLLKKKK · 2026-05-01T12:00:11Z

AI Code Review - PR #962

Status: BLOCKING

Summary: P0/1 · P1/0 · P2/1 · P3/1

Blocking Issues

P0

open-source arch_select.bzl 仍 load @pip_gpu_cuda12_torch 但 pip_parse 已删除，开源端 workspace 评估会失败 @ arch_config/arch_select.bzl:4
- 建议：二选一：(1) 在 arch_select.bzl 移除 requirement_gpu_cuda12 的 load，并将 cuda_pre_12_9 分支并入 using_cuda12_9_x86 或 default；(2) 若仍需保留 cuda12 (CUDA 12.4/12.6) 开源构建路径，则在 deps/pip.bzl 重新加上 pip_parse(name="pip_gpu_cuda12_torch", ...) 并在 WORKSPACE 调用 install_deps。建议先验证 bazel build --config=cuda12_9 //:th_transformer 与 bazel build --config=cuda12 //:th_transformer 在纯开源 worktree 都能通过。

Non-blocking Suggestions

P2

BUILD 保留 using_arm / using_cpu config_setting 但已无任何 build:config 设置对应 define，select 分支静默走默认 @ BUILD:62
- 建议：在保留注释的同时，给 using_arm/using_cpu 加 deprecation = "..." 属性或在注释中给出清理跟踪项（issue 链接或 follow-up PR），并跨仓库 grep 一遍 using_arm / using_cpu 引用，把已知不再生效的分支也一并删掉。

P3

PIP_EXTRA_ARGS 用 --index-url 覆盖环境变量，行为变更值得在 PR 描述/迁移说明中更醒目地标注 @ deps/pip.bzl:1
- 建议：在 PR description 或 deps/ 下任何已有 README/HOWTO 中加一段：列出当前 5 个 index、强调不使用内部 mirror、给出 bazel run //deps:requirements_<cfg>.update 的执行约束。

Checklist Violations (2 fail / 23 total)

General Principles Checklist

[6.1] Architecture — 兼容性：公开 API/持久数据/配置/环境迁移安全 → issue open-source arch_select.bzl 仍 load @pip_gpu_cuda12_torch 但 pip_parse 已删除，开源端 workspace 评估会失败
arch_config/arch_select.bzl 仍 load @pip_gpu_cuda12_torch，但 deps/pip.bzl 删除对应 pip_parse、WORKSPACE 删除 install_deps，开源仓库任意 --config 构建会因 unresolved repository 直接失败，破坏开源端兼容性。
[6.1] Quality — PR description 说明动机与设计 → issue PIP_EXTRA_ARGS 用 --index-url 覆盖环境变量，行为变更值得在 PR 描述/迁移说明中更醒目地标注
deps/pip.bzl 把 PIP_INDEX_URL 改为 --index-url 强覆盖、新增 4 个 download.pytorch.org extra-index-url，是会影响所有后续 lock 再生流程的策略变更，建议在 PR description / deps README 中显式标注 5 个 index 列表与执行约束。

Strengths

lock 文件按 platform 分别更新（cuda12 / cuda12_9 / rocm / arm / cpu），并把 torch 安装源从 aliyun mirror 切换到 download.pytorch.org，便于复现 wheel 链接。
deps/http.bzl 与 deps/pip.bzl 注释清晰描述了为什么改用 --index-url 覆盖容器 env 以及为什么 mirror 不可用，方便后续维护者理解决策背景。

LLLLKKKK · 2026-05-01T14:14:36Z

AI Code Review - PR #962

Status: BLOCKING

Summary: P0/1 · P1/2 · P2/2 · P3/0

Blocking Issues

P0

arch_select.bzl 仍 load 已删除的 @pip_gpu_cuda12_torch @ arch_config/arch_select.bzl:3
- 建议：同步删除 arch_select.bzl 中 @pip_gpu_cuda12_torch 的 load 与 cuda_pre_12_9 → requirement_gpu_cuda12 分支；如需保留 cu126 入口，请恢复 deps/pip.bzl 中对应的 pip_parse 与 WORKSPACE install_deps。落库前在开源 workspace 跑一次 bazel query //... --config=cuda12_9 验证 arch_select.bzl 可被加载。

P1

pip_ppu_torch 锁文件由 cuda12 切到 cuda12_9，torch 跨大版本升级 @ deps/pip.bzl:25
- 建议：明确 PPU 的目标 torch/CUDA 版本。如果 PPU 仍需 cu126，请保留 requirements_lock_torch_gpu_cuda12.txt 与对应 lockfile；如果确实要升级，请在 PPU 容器中验证一次 bazel build + 启动推理 smoke，并在 PR description 注明影响范围。
nvidia-cutlass-dsl 由 4.4.1 回退到 4.3.5 @ deps/requirements_lock_torch_gpu_cuda12_9.txt:2247
- 建议：在 PR description 说明回退到 4.3.5 的原因（是否 4.4.1 触发了已知 bug，或是为了与 flashinfer 0.6.6 兼容）。建议触发一次 cuda12_9 的 perf/smoke 验证 GEMM/MoE 路径未回归。

Non-blocking Suggestions

P2

flash-mla 在 cuda12_arm 与 cuda12_9 锁文件中 commit 不一致 @ deps/requirements_cuda12_arm.txt:9
- 建议：统一两侧 commit（推荐两侧都用最新的 ca58fed），或在 lockfile/上层 README 记录“arm 暂留旧 commit 的原因 + 跟踪 issue”。否则后续 perf/正确性 diff 时无法快速定位差异来源。
开源 .bazelrc 移除 build:cpu/build:arm 但 BUILD 中保留 config_setting 未落地清理 @ BUILD:62
- 建议：在 PR description 列出后续清理计划（rtp_llm/BUILD、barex_rdma/BUILD 中 :using_arm/:using_cpu select 分支何时移除），或者直接在本 PR 内顺手清理这些 select 分支，避免长期 dead code。

Checklist Violations (6 fail / 23 total)

General Principles Checklist

[6.1] Architecture — 依赖方向：无循环依赖/跨层惊喜 → issue arch_select.bzl 仍 load 已删除的 @pip_gpu_cuda12_torch
arch_select.bzl 仍 load @pip_gpu_cuda12_torch，但 deps/pip.bzl 不再注册该外部仓库，导致开源 bazel 加载阶段直接失败 —— 出现了一个不存在的依赖方向。
[6.1] Architecture — 兼容性：公开 API/持久数据/配置/环境迁移安全 → issue pip_ppu_torch 锁文件由 cuda12 切到 cuda12_9，torch 跨大版本升级
PPU 锁文件由 cu126/torch 2.6 切到 cu129/torch 2.8，跨 CUDA minor 版本，PPU 容器 CUDA 12.3 base 兼容性未经验证；同时 nvidia-cutlass-dsl 由 4.4.1 回退到 4.3.5。两个变更都属于隐式公开依赖迁移，需要在 PR description 评估对存量构建/部署影响。
[6.1] Tests — 新逻辑有聚焦单测 + 相关集成/smoke 测试 → issue arch_select.bzl 仍 load 已删除的 @pip_gpu_cuda12_torch
PR 改了 .bazelrc / WORKSPACE / pip.bzl / arch_select.bzl / 7 份 lockfile，但没有附带任何 bazel build 验证证据；http.bzl 的 TODO 注释自己也写了“requires per-config bazel build verification before landing”。
[6.1] Tests — 分布式/跨平台变更有对应覆盖 → issue flash-mla 在 cuda12_arm 与 cuda12_9 锁文件中 commit 不一致
PR 涉及 cuda12 / cuda12_9 / cuda12_9_arm / rocm / ppu 多个平台 lockfile，但 PR 中没有任何跨平台 bazel build 验证；flash-mla 在 arm/x86 commit 不一致也没有跨平台一致性校验。
[6.1] Quality — Mega-PR 已拆分为独立变更 → issue pip_ppu_torch 锁文件由 cuda12 切到 cuda12_9，torch 跨大版本升级
PR 同时做了：(a) 删除 cpu/arm build configs，(b) 重构 pip 索引/lockfile 格式，(c) PPU torch 大版本升级，(d) cutlass-dsl 版本回退。其中 (c)(d) 与 pip_unify 主题正交，可以拆出来单独评审/回滚。
[6.1] Quality — PR description 说明动机与设计 → issue 开源 .bazelrc 移除 build:cpu/build:arm 但 BUILD 中保留 config_setting 未落地清理
PR 没有解释 (a) cutlass-dsl 4.4.1→4.3.5 的回退原因，(b) PPU 切换 cu129 是否经过验证，(c) 后续 BUILD 中 dead select 分支的清理计划。这些都属于读者需要的关键 why。

Strengths

新增 rtp_opensource_deps local_repository 让内部 overlay 复用开源 lockfile，避免内外两份 -r wrapper 漂移
PIP_EXTRA_ARGS 显式 --index-url 覆盖容器 PIP_INDEX_URL，注释清楚解释了 artlab 隔离规则
BUILD/WORKSPACE/pip.bzl/http.bzl 的删除点都附带原因注释（CPU/ARM 路径退役、SJTU mirror 不可用），方便未来回溯
切换到 PyPI semver pin（替换裸 URL）让锁文件可读、可被 uv 复用，并使 hash 校验更严格

Copilot

Pull request overview

This PR updates the Bazel/Python dependency workflow to resolve wheels from a unified set of PEP 503 indexes, switching to “thin” per-platform requirement inputs (shared requirements_base.txt) and regenerated lockfiles so the same filenames/versions resolve consistently across environments.

Changes:

Standardize pip index configuration via deps/pip.bzl (--index-url + PEP 503 --extra-index-url list).
Replace direct wheel URLs in requirement inputs with pinned package specs and PEP 503/simple-based references; regenerate lockfiles accordingly.
Remove legacy CPU/CUDA12(cu126) requirement sources/targets and adjust Bazel selects and WORKSPACE wiring.

Reviewed changes

Copilot reviewed 16 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
deps/pip.bzl	Defines unified pip index configuration and the remaining `pip_parse` repos.
deps/BUILD	Exports per-config requirements files and removes compile targets for dropped configs.
deps/http.bzl	Removes CPU torch http_archives; adds TODOs for future consolidation with internal overlay.
deps/requirements_torch_gpu_cuda12_9.txt	Converts to pinned “thin” input requirements for cu129 x86_64.
deps/requirements_lock_torch_gpu_cuda12_9.txt	Regenerated lockfile using unified index set.
deps/requirements_rocm.txt	Updates ROCm requirements to use pinned specs and unified indexes.
deps/requirements_lock_rocm.txt	Regenerated ROCm lockfile using unified index set.
deps/requirements_cuda12_arm.txt	Adds pinned CUDA12-arm requirements aligned with unified indexes.
deps/requirements_lock_cuda12_arm.txt	Regenerated CUDA12-arm lockfile using unified index set.
arch_config/arch_select.bzl	Removes CPU/ARM-CPU pip repos, changes default Python deps selection, and adjusts torch deps selection.
WORKSPACE	Adds `rtp_opensource_deps` and removes some pip repo installs; still loads `@pip_ppu_torch`.
BUILD	Keeps `using_arm`/`using_cpu` config_settings as “label-only” placeholders.
.bazelrc	Removes `--config=cpu` and `--config=arm` sections.
deps/requirements_torch_gpu_cuda12.txt (deleted)	Removes legacy cu126 GPU requirements input.
deps/requirements_torch_cpu.txt (deleted)	Removes legacy CPU torch requirements input.
deps/requirements_cpu_arm.txt (deleted)	Removes legacy ARM CPU requirements input.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

 load("@pip_ppu_torch//:requirements.bzl", pip_ppu_torch_install_deps = "install_deps")
 pip_ppu_torch_install_deps()



 load("@pip_gpu_cuda12_9_torch//:requirements.bzl", requirement_gpu_cuda12_9="requirement")
 load("@pip_gpu_rocm_torch//:requirements.bzl", requirement_gpu_rocm="requirement")
 load("@rtp_llm//bazel:defs.bzl", "copy_so")

+# cuda12 (cu126/torch2.6) was dropped — pip_gpu_cuda12_torch no longer registered
+# in deps/pip.bzl. Remaining `cuda_pre_12_9` select branches resolve via cuda12_9.
+requirement_gpu_cuda12 = requirement_gpu_cuda12_9
+


@@ -77,26 +80,16 @@ def torch_deps():
            "@torch_rocm//:torch",
            "@torch_rocm//:torch_libs",
        ],


LLLLKKKK · 2026-05-01T15:43:05Z

AI Code Review - PR #962

Status: LGTM

Summary: P0/0 · P1/0 · P2/1 · P3/1

lgtm ready to ci

Non-blocking Suggestions

P2

cuda_pre_12_9 路径 torch native 与 Python wheel ABI 不一致 @ arch_config/arch_select.bzl:8
- 建议：彻底清理 cu126 路径：从 .bazelrc 删除 build:cuda12 ... 一组配置，从 deps/http.bzl 删除 torch_2.6_py310_cuda，从 arch_select.bzl 删除 cuda_pre_12_9 select 分支与 requirement_gpu_cuda12 别名；或保留 cuda12 配置则需同时新建 cu126 对应的 pip_parse + lockfile，避免出现 native/Python 版本错配的中间态。当前 inline 注释只承认意图，未实际收敛。

P3

保留 using_arm/using_cpu config_setting 但移除对应 --config，select 静默走 default @ BUILD:62
- 建议：将该 config_setting 标记为 deprecated（BUILD 注释 + 引用处一并移除 select 分支），或在 .bazelrc 增加显式禁用提示；当前注释只解释为何保留，未提示该 label 在 select 中已是死分支。

Checklist Violations (3 fail / 23 total)

General Principles Checklist

[6.1] Architecture — 错误语义：fail-fast/retry/fallback/silent 行为显式 → issue 保留 using_arm/using_cpu config_setting 但移除对应 --config，select 静默走 default
BUILD:62-67 说明 using_arm/using_cpu config_setting 因仍被 select 引用而保留，但对应的 --config=arm/cpu 已删除：用户若按旧文档传入 --define=using_arm=true，select 分支永不命中，静默走 default 而非显式报错。
[6.1] Architecture — 兼容性：公开 API/持久数据/配置/环境迁移安全 → issue cuda_pre_12_9 路径 torch native 与 Python wheel ABI 不一致
arch_config/arch_select.bzl:8 requirement_gpu_cuda12 = requirement_gpu_cuda12_9 使 cu126 路径 Python wheel 静默切换到 cu129；但 .bazelrc 中 --config=cuda12 仍存在、torch_deps 仍加载 torch_2.6_py310_cuda，形成 native/Python ABI 不一致。原 cuda12 用户若仍走该 config 会获得运行时不可用的混合栈。
[6.1] Tests — 分布式/跨平台变更有对应覆盖 → checklist-only
PR 同时影响 cuda12_9 / cuda12_arm / rocm 三个 platform 的 lockfile，但未在 PR 描述中给出 cuda12_arm 与 rocm 的本地或 CI 验证证据；尤其 cuda_pre_12_9 fallback 在 cuda12 build 下未明确验证。

Strengths

每处删除/重定向均附 inline 注释说明动机（pip.bzl 拒绝 artlab、SJTU 镜像 403 行为、cuda_pre_12_9 fallback 路径），未来回溯成本低
lockfile 同步覆盖 cuda12_arm / rocm / cuda12_9 三个 config，避免 pip-compile→uv 切换出现部分 config 漂移
WORKSPACE 同步移除已废弃 install_deps 调用，保持 pip_parse 注册与 install_deps 调用一一对应

wht21 · 2026-05-01T15:46:42Z

internal source has been updated, please review the changes!

Copilot

Pull request overview

Updates the Bazel/Python dependency plumbing to use a unified set of PEP 503 indexes (Aliyun PyPI mirror + PyTorch per-platform indexes + RTP OSS “simple” bucket) and shifts the repository’s default CUDA build/test config from cuda12_6 to cuda12_9.

Changes:

Replace per-platform “full” requirements with thin inputs and regenerated lockfiles (now generated by uv).
Simplify Bazel pip parsing to focus on cuda12_9 + cuda12_arm + ROCm and remove older CPU/CUDA12.6-era branches/configs.
Update scripts/docs/test configs to use --config=cuda12_9.

Reviewed changes

Copilot reviewed 30 out of 33 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
rtp_llm/test/perf_test/multi_node/multi_runner.sh	Updates default Bazel build args to `--config=cuda12_9`.
rtp_llm/test/perf_test/multi_node/multi_benchmark_config.yaml	Updates benchmark build args to `--config=cuda12_9`.
rtp_llm/models_py/standalone/BUILD	Removes `cuda_pre_12_9` select branches.
rtp_llm/models_py/modules/hybrid/test/BUILD	Removes `cuda_pre_12_9` select branches for test deps.
rtp_llm/models_py/modules/factory/attention/cuda_impl/test/BUILD	Removes `cuda_pre_12_9` select branches for test deps.
rtp_llm/models_py/modules/factory/attention/cuda_cp_impl/test/BUILD	Removes `cuda_pre_12_9` select branches for test deps.
rtp_llm/models_py/kernels/cuda/test/BUILD	Removes commented `cuda_pre_12_9` branch in a disabled test stanza.
rtp_llm/BUILD	Removes ARM/CPU-specific select branches and `cuda_pre_12_9` branches in Python deps lists.
docs/start/install.md	Updates example build command/config to `cuda12_9`.
docs/references/profiling.md	Updates profiling command to `cuda12_9`.
docs/references/debug.md	Updates debug test command to `cuda12_9`.
docs/benchmark/benchmark.md	Updates benchmark doc snippet to `cuda12_9`.
docs/backend/3fs.md	Updates build example to `cuda12_9`.
deps/requirements_torch_gpu_cuda12_9.txt	Converts to normalized names + PEP503 style references/pins for unified resolution.
deps/requirements_torch_gpu_cuda12.txt	Removes legacy CUDA12.6 requirements input.
deps/requirements_torch_cpu.txt	Removes legacy CPU requirements input.
deps/requirements_rocm.txt	Switches ROCm requirements to unified indexes and pinned names/versions.
deps/requirements_lock_torch_gpu_cuda12_9.txt	Regenerated lockfile (uv) with unified index configuration and hashes.
deps/requirements_lock_rocm.txt	Regenerated lockfile (uv) with unified index configuration and hashes.
deps/requirements_lock_cuda12_arm.txt	Regenerated lockfile (uv) for the `cuda12_arm` config.
deps/requirements_cuda12_arm.txt	Defines thin `cuda12_arm` requirements input (torch/torchvision + custom wheels).
deps/requirements_cpu_arm.txt	Removes legacy ARM-CPU requirements input.
deps/pip.bzl	Defines unified `PIP_EXTRA_ARGS`; drops older pip repos and adds `pip_cuda12_arm_torch`.
deps/http.bzl	Removes older torch http_archives; keeps torch 2.8 CUDA wheel and ROCm torch wheel.
deps/BUILD	Exports per-config requirement sources and drops legacy compile targets.
arch_config/arch_select.bzl	Removes CPU/ARM-CPU requirement routing and `cuda_pre_12_9` routing; simplifies torch deps selection.
WORKSPACE	Adds `rtp_opensource_deps` repo and removes legacy pip installs, while keeping CUDA12.9/ROCm/ARM installs.
BUILD.pytorch	Removes `cuda_pre_12_9` linkopts branch.
BUILD	Removes `cuda_pre_12_9`, `using_arm`, `using_cpu` config settings; updates compdb refresh target to `cuda12_9`.
.bazelrc	Removes `cuda12_2`, `cuda12_6`, `cpu`, `arm` configs; keeps `cuda12_9` configs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

 load("@pip_ppu_torch//:requirements.bzl", pip_ppu_torch_install_deps = "install_deps")
 pip_ppu_torch_install_deps()



LLLLKKKK · 2026-05-01T17:10:04Z

AI Code Review - PR #962

Status: LGTM

Summary: P0/0 · P1/0 · P2/1 · P3/3

lgtm ready to ci

Non-blocking Suggestions

P2

requirement() 默认分支让 cuda12_arm 拉到 x86 cuda12_9 wheel @ arch_config/arch_select.bzl:15
- 建议：在 arch_select.bzl 顶部 load @pip_cuda12_arm_torch//:requirements.bzl 取 requirement_cuda12_arm，并在 requirement()/torch_deps() 里加 @rtp_llm//:using_cuda12_arm 分支路由到该 lockfile；或显式确认 cuda12_arm 不再受支持（同时撤掉 pip_cuda12_arm_torch 注册和 cuda12_9_arm bazelrc/requirements_lock）。本 PR 既保留 cuda12_arm 配置又不接它就是不一致。

P3

whl_deps() 注释/默认值与 cuda12_arm lockfile 的 torch 版本不一致 @ arch_config/arch_select.bzl:59
- 建议：要么把注释订正为 'cuda12_9_x86 uses torch 2.8+cu129; cuda12_9_arm uses torch 2.9+cu129' 并为 cuda12_arm 单独加 select 分支返回 torch==2.9.0+cu129；要么和 cuda12_arm lockfile 对齐版本。
rtp-kernel/deep-ep 在 cuda12_9 与 cuda12_arm 锁文件中版本漂移 @ deps/requirements_lock_cuda12_arm.txt:1
- 建议：在 update_pip 流程里把同一上游 commit 的 wheel 同步发布到两个 index、并一次性 regen 两个 lockfile；或在 PR 描述里明确说明为何允许版本漂移。
requirement() 默认下载 cuda12_9 wheel 影响纯 CPU 测试体验 @ arch_config/arch_select.bzl:20
- 建议：若团队不再支持 CPU/ARM-CPU 路径，可在 .bazelrc 里把 build --config=cuda12_9 设为隐式默认；否则给 default 分支加一个轻量空 fallback 或显式报错，避免无声拉巨型 wheel。

Checklist Violations (3 fail / 25 total)

General Principles Checklist

[6.1] Architecture — 错误语义：fail-fast/retry/fallback/silent 行为显式 → issue requirement() 默认分支让 cuda12_arm 拉到 x86 cuda12_9 wheel
arch_select.bzl 默认分支吞掉了 unknown config（包括 cuda12_arm）静默路由到 cuda12_9（x86）wheel，没有任何 fail-fast 检测，下游会拉到错架构 wheel 才在运行/import 阶段炸开。
[6.1] Architecture — 兼容性：公开 API/持久数据/配置/环境迁移安全 → issue requirement() 默认下载 cuda12_9 wheel 影响纯 CPU 测试体验
--config=cuda12_2/cuda12_6/cpu/arm 这些公开构建入口被一次性删除，外部用户/历史脚本（含已经更新的 docs，但开源用户旧脚本无法感知）pin 到这些 config 会立即失败。PR 描述里未列迁移指南或废弃公告。
[6.1] Tests — 分布式/跨平台变更有对应覆盖 → issue requirement() 默认分支让 cuda12_arm 拉到 x86 cuda12_9 wheel
PR 触及跨架构（x86 / aarch64 / rocm）依赖图，但未在 PR 描述/commit message 中提供 cuda12_9_arm 与 rocm CI build 通过证据，arch_select.bzl 收敛后是否仍能正确路由 cuda12_arm 需独立验证。

Strengths

成片清掉 cuda12_2/cuda12_6/cpu/arm/cpu_latest/cuda_pre_12_9 等死 config 与对应 select 分支，未来 BUILD/select 维护成本显著下降。
pip.bzl 顶部的 strict-separation 注释把 artlab、aliyun、download.pytorch.org、rtp-opensource OSS 各自的角色和不能用 SJTU 镜像的原因都讲清楚，决策可追溯。
URL-pin → PEP 503 索引迁移使内外源 wheel 来源收敛、缓存可复用，并在 lockfile 改用 uv 生成（信息更紧凑）。
新增 rtp_opensource_deps 句柄和保留必要的 exports_files 显式声明，让 internal_source 通过 --override_repository 复用 opensource 锁文件这条数据流变得显式可读。
WORKSPACE 中 pip_ppu_torch stub 注释清楚说明开源-only 构建不会真正下载 PPU wheel，避免外部用户被 PPU 资源阻塞。
文档（3fs/benchmark/debug/profiling/install）和 perf_test 脚本（multi_runner.sh、multi_benchmark_config.yaml）里 cuda12_6 → cuda12_9 同步更新，没留下死引用。

wht21 · 2026-05-01T17:17:31Z

internal source has been updated, please review the changes!

utils/util.py imports aiohttp but the py_library target never declared the pip dep. Surfaced on ut-sm8x in PR #962 (util_test, duplicated_kv_test → ModuleNotFoundError: No module named 'aiohttp'). Likely latent for a while — only visible when cuda12_9_x86 with a strict py_test runfiles sandbox runs these targets. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Unifies Bazel/RULES_PYTHON dependency resolution around a single set of PEP 503 indexes (Aliyun PyPI mirror + download.pytorch.org per backend + RTP OSS “simple” index for custom wheels), while also standardizing the repo’s default CUDA build path onto cuda12_9 and regenerating lockfiles accordingly.

Changes:

Reworked deps/pip.bzl + requirements inputs/lockfiles to resolve from PEP 503 indexes (and removed legacy CPU / older CUDA12 inputs).
Updated Bazel config surface to remove cuda12_6 / CPU / ARM-CPU settings and align scripts/docs/tests on cuda12_9.
Added an opensource deps repository handle (rtp_opensource_deps) for internal overlay reuse.

Reviewed changes

Copilot reviewed 30 out of 33 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
rtp_llm/test/perf_test/multi_node/multi_runner.sh	Switch default multi-node build/test config from `cuda12_6` → `cuda12_9`.
rtp_llm/test/perf_test/multi_node/multi_benchmark_config.yaml	Updates benchmark build args to `cuda12_9`.
rtp_llm/models_py/standalone/BUILD	Drops `cuda_pre_12_9` select branches in standalone deps.
rtp_llm/models_py/modules/hybrid/test/BUILD	Removes `cuda_pre_12_9` flashinfer deps branch.
rtp_llm/models_py/modules/factory/attention/cuda_impl/test/BUILD	Removes `cuda_pre_12_9` flashinfer deps branches for tests.
rtp_llm/models_py/modules/factory/attention/cuda_cp_impl/test/BUILD	Removes `cuda_pre_12_9` flashinfer deps branches for CP tests.
rtp_llm/models_py/kernels/cuda/test/BUILD	Removes commented `cuda_pre_12_9` select branch.
rtp_llm/BUILD	Removes legacy ARM/CPU selects, adds `aiohttp` to deps list, and keeps CUDA12.9/ARM/ROCm selects.
docs/start/install.md	Updates example Bazel build command to `cuda12_9`.
docs/references/profiling.md	Updates profiling example to `cuda12_9`.
docs/references/debug.md	Updates debug/test example to `cuda12_9`.
docs/benchmark/benchmark.md	Updates benchmark doc snippet to `cuda12_9`.
docs/backend/3fs.md	Updates 3FS build example to `cuda12_9`.
deps/requirements_torch_gpu_cuda12_9.txt	Converts to “thin” named requirements resolving via unified indexes.
deps/requirements_torch_gpu_cuda12.txt	Removes legacy CUDA12 (cu126) input requirements file.
deps/requirements_torch_cpu.txt	Removes legacy CPU torch requirements file.
deps/requirements_rocm.txt	Converts ROCm requirements to named deps aligned with unified indexes.
deps/requirements_lock_torch_gpu_cuda12_9.txt	Regenerated lockfile using the new index set and inputs.
deps/requirements_lock_rocm.txt	Regenerated ROCm lockfile using the new index set and inputs.
deps/requirements_lock_cuda12_arm.txt	Regenerated CUDA12 ARM lockfile using the new index set and inputs.
deps/requirements_cuda12_arm.txt	New thin CUDA12 ARM requirements source.
deps/requirements_cpu_arm.txt	Removes legacy ARM CPU requirements file.
deps/pip.bzl	Centralizes index config (explicit `--index-url` + extra indexes) and streamlines pip_parse repos to CUDA12.9/ARM/ROCm (+ PPU stub).
deps/http.bzl	Removes legacy torch http_archives (cpu + cu126), keeps torch 2.8 CUDA archive and other deps.
deps/BUILD	Exports per-config requirements sources and removes compile rules for deleted CPU/cu126 variants.
arch_config/arch_select.bzl	Simplifies requirement selection toward CUDA12.9/ROCm and drops legacy CPU/cu126 selects.
WORKSPACE	Adds `rtp_opensource_deps` repo handle and updates pip install_deps set (CUDA12.9/ARM/ROCm).
BUILD.pytorch	Removes pre-12.9 CUDA linkopts branch.
BUILD	Removes `cuda_pre_12_9`, `using_arm`, `using_cpu` config_settings; updates compdb config.
.bazelrc	Removes `cuda12_2`, `cuda12_6`, `cpu`, and `arm` configs; keeps `cuda12_9_*` and ROCm.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

            deps = select({
-                "@rtp_llm//:cuda_pre_12_9": [requirement_gpu_cuda12(name)],
                "@rtp_llm//:using_cuda12_9_x86": [requirement_gpu_cuda12_9(name)],
                "@rtp_llm//:using_rocm": [requirement_gpu_rocm(name)],
-                "@rtp_llm//:using_arm": [requirement_arm(name)],
-                "//conditions:default": [requirement_cpu(name)],
+                # Default falls through to cuda12_9 (the canonical x86 GPU build).
+                # CPU-only and ARM-CPU configs were removed; if you need them,
+                # restore the corresponding pip_parse + lockfile + select branch.
+                "//conditions:default": [requirement_gpu_cuda12_9(name)],
            }),


-        "@rtp_llm//:using_cuda12": ["torch==2.6.0+cu126"],
        "@rtp_llm//:using_rocm": ["pyrsmi==0.2.0", "amdsmi@https://sinian-metrics-platform.oss-cn-hangzhou.aliyuncs.com/kis%2FAMD%2Famd_smi%2Fali%2Famd_smi.tar", "aiter@https://sinian-metrics-platform.oss-cn-hangzhou.aliyuncs.com/kis/AMD/RTP/aiter-0.1.13.dev14%2Bgfa35072d0.d20260402-cp310-cp310-linux_x86_64.whl"],
-        "//conditions:default": ["torch==2.1.2"],
+        # Default covers cuda12_9_x86, cuda12_9_arm (both use torch 2.8+cu129).


LLLLKKKK · 2026-05-01T18:16:10Z

AI Code Review - PR #962

Status: LGTM

Summary: P0/0 · P1/0 · P2/0 · P3/0

lgtm ready to ci

Checklist ✅ (25 items passed)

Strengths

增量修复定位精准：utils/util.py:11 顶层 import aiohttp 但 //rtp_llm:utils py_library 漏声明依赖，本次 commit 仅追加一行 :aiohttp 解决 ModuleNotFoundError，未引入额外耦合。
Commit message 写明了根因（dep 漏声明）、触发条件（cuda12_9_x86 严格 py_test runfiles sandbox）与受影响测试（util_test、duplicated_kv_test），便于后续 bisect。

wht21 · 2026-05-01T18:43:45Z

internal source has been updated, please review the changes!

…CE/utils fixes Squashed roll-up of the pip-unification work for the opensource side. Strict separation invariant (Phase 2): - Opensource builds may only consume mirrors.aliyun.com/pypi/simple/ + rtp-opensource OSS + download.pytorch.org/whl/<cfg>/. - Internal builds may only consume artlab.alibaba-inc.com + rtp-opensource OSS. - The rtp-opensource OSS bucket is the SINGLE shared mirror for RTP-LLM custom wheels (flash_attn, deep_ep, deep_gemm, flashinfer_*, rtp_kernel, etc.) — same URL serves both sides. Build cfg cleanup: - Drop cuda12 (cu126/torch2.6) build path entirely; cuda12_9 (cu129/torch2.8) is now the canonical x86 GPU build. cuda_pre_12_9 config_setting + every select branch that used it removed (latent ABI mismatch: cu126 native libs with torch 2.8 Python deps from the cu129 pip_parse). - Drop using_arm / using_cpu config_settings + their select branches — --config=arm / --config=cpu bazelrc entries were removed with the lockfiles so the [] branches were unreachable and default branches (decord, xfastertransformer_devel) silently applied. - Drop dead pip_ppu_torch lockfile registration from opensource pip.bzl (no using_ppu select branch in opensource arch_select.bzl). - Drop unused cu126 + cpu --extra-index-url from opensource pip.bzl — halves per-package query surface against download.pytorch.org and reduces regen timeout flakiness. - Drop CPU-only and ARM-CPU build paths (requirements_torch_cpu.txt, requirements_torch_arm.txt, requirements_cpu_arm.txt) — no bazel cfg wires them up anymore. - Drop build:cuda12_2 / build:cuda12_6 from .bazelrc; retarget multi-node perf test scripts/yaml + 5 docs (install/benchmark/debug/profiling/3fs) to --config=cuda12_9. - Delete torch_2.6_py310_cuda http_archive + cuda_pre_12_9 branch in BUILD.pytorch linkopts. Shared-wheel hygiene: - aarch64 wheels retagged manylinux_2_28_aarch64 (uv refuses bare linux_aarch64 under aarch64-manylinux python-platform) and uploaded to OSS for direct version-pin. - URL-pin patched wheels to OUR OSS so resolver can't silently swap to an upstream copy with different bytes (flash_attn 2.8.3+cu12torch2.8cxx11abiTRUE, flashinfer-python 0.6.6, flashinfer-cubin 0.6.6 — we host patched variants). - nvidia-cutlass-dsl pinned 4.4.1 (latest on artlab; hash matches across artlab + aliyun mirror); pulls libs-base==4.4.1 transitive. Whl_deps tightening: - whl_deps() select key changed from :using_cuda12 → :using_cuda12_9_x86 + default. The :using_cuda12 key over-matched every CUDA variant (cuda12_9 inherits build:cuda12 which sets using_cuda12=true), so cu129 wheels were baking torch==2.6.0+cu126 into install_requires. WORKSPACE/pip stub: - WORKSPACE unconditionally calls pip_ppu_torch_install_deps(). Internal builds satisfy that via --override_repository=rtp_deps but opensource builds don't — so opensource-only builds errored 'Failed to load Starlark extension @pip_ppu_torch//:requirements.bzl'. Register pip_ppu_torch in opensource pip.bzl as a lazy alias of the cuda12_9 lockfile (pip_parse is lazy, no opensource select depends on it, so no PPU wheel is ever fetched). py_library deps: - //rtp_llm:utils now declares :aiohttp. utils/util.py imports aiohttp but the target had no pip dep — surfaced as ModuleNotFoundError on ut-sm8x's util_test + duplicated_kv_test once the strict py_test runfiles sandbox started enforcing it. Verified end-to-end on Aone CI pipeline 1346, run 39069908: all 28 jobs SUCCESS (cuda12_9 / cuda12_9_arm / amd / ppu / frontend builds + ut + smoke + perf + open_source variants). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

LLLLKKKK · 2026-05-02T02:11:30Z

AI Code Review - PR #962

Status: LGTM

Summary: P0/0 · P1/0 · P2/1 · P3/2

lgtm ready to ci

Non-blocking Suggestions

P2

cuda12_arm 依赖大幅扩展，需验证 aarch64 wheel 兼容性 @ deps/requirements_cuda12_arm.txt:1
- 建议：在合入前在 cuda12_9_arm 平台上跑一遍 smoke + 基础 import 验证（至少 flash-attn / flash-attn-3 / deep-gemm / flashinfer-python），或在 PR description 中说明哪些包是有意 stub-resolve（运行时不会被 import 触发）。

P3

arch_select.bzl whl_deps() 默认 fallback 由 CPU torch 改为 CUDA torch @ arch_config/arch_select.bzl:55
- 建议：可选：把 default 分支替换为 no_match_error 或保留显式 cuda12_9_x86/cuda12_9_arm 分支 + default = fail，让漏配的新平台在 bazel analysis 阶段直接报错而不是误装 cu129 包。
rtp-kernel 在 cuda12_arm 与 cuda12_9 lockfile 中存在构建时间戳漂移 @ deps/requirements_lock_cuda12_arm.txt:1
- 建议：对齐两个 lockfile 中 rtp-kernel 的版本号（同一构建产物使用同一时间戳），或在 update_pip 流程里固定使用其中一个时间戳。

Checklist Violations (1 fail / 36 total)

RTP-LLM Checklist

[H] 测试与 CI — 测试覆盖充分：大重构等价覆盖，新功能端到端测试 → issue cuda12_arm 依赖大幅扩展，需验证 aarch64 wheel 兼容性
cuda12_arm 依赖列表新增 flash-attn / flash-attn-3 / deep-ep / deep-gemm / nvidia-cutlass-dsl 等，PR diff 内未见 cuda12_9_arm 平台的 smoke / import 验证证据。

Strengths

大型清理 PR，每处删除（cuda12_2/cuda12_6/cpu/arm config，requirements_cpu_arm.txt 等）都有对应的 .bazelrc / BUILD / WORKSPACE / arch_select.bzl 同步更新，没有留下悬空引用。
deps/pip.bzl 注释非常详尽，解释了 PIP_EXTRA_ARGS 索引选择策略以及 pip_ppu_torch stub 在内外源 --override_repository 体系下为什么必须保留。
WORKSPACE 新增 rtp_opensource_deps 并附注释，明确它与 rtp_deps 在内源 overlay 下的角色分离。
同步更新了文档（install.md / debug.md / profiling.md / 3fs.md / benchmark.md）和 perf 测试脚本中的 cuda12_6 引用，避免示例与构建系统脱钩。
rtp_llm/BUILD 把 utils 隐式依赖的 aiohttp 显式化（rtp_llm/utils/util.py 已经 import aiohttp），消除了通过其他包传递的 fragile 依赖。

Copilot AI review requested due to automatic review settings April 30, 2026 12:53

Copilot started reviewing on behalf of LLLLKKKK April 30, 2026 12:54 View session

Copilot AI reviewed Apr 30, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings May 1, 2026 08:00

Copilot started reviewing on behalf of LLLLKKKK May 1, 2026 08:00 View session

Copilot AI reviewed May 1, 2026

View reviewed changes

LLLLKKKK force-pushed the feature/pip_unify_v2 branch from ef772e7 to e4f6072 Compare May 1, 2026 10:27

LLLLKKKK force-pushed the feature/pip_unify_v2 branch from e4f6072 to 735509e Compare May 1, 2026 13:35

Copilot AI review requested due to automatic review settings May 1, 2026 15:28

Copilot started reviewing on behalf of LLLLKKKK May 1, 2026 15:29 View session

Copilot AI reviewed May 1, 2026

View reviewed changes

LLLLKKKK force-pushed the feature/pip_unify_v2 branch from 0a1800a to 642ed26 Compare May 1, 2026 15:43

Copilot AI review requested due to automatic review settings May 1, 2026 16:11

Copilot started reviewing on behalf of LLLLKKKK May 1, 2026 16:12 View session

Copilot AI reviewed May 1, 2026

View reviewed changes

Comment thread WORKSPACE

Comment on lines 49 to 51

load("@pip_ppu_torch//:requirements.bzl", pip_ppu_torch_install_deps = "install_deps")

pip_ppu_torch_install_deps()

Copilot AI review requested due to automatic review settings May 1, 2026 18:04

Copilot started reviewing on behalf of LLLLKKKK May 1, 2026 18:04 View session

Copilot AI reviewed May 1, 2026

View reviewed changes

LLLLKKKK force-pushed the feature/pip_unify_v2 branch from fe60498 to d0a6fd8 Compare May 2, 2026 01:53

		# availability regressions.
		# See /home/liukan.lk/.claude/plans/serialized-wibbling-snail.md Phase 5.

		load("@pip_ppu_torch//:requirements.bzl", pip_ppu_torch_install_deps = "install_deps")
		pip_ppu_torch_install_deps()

Conversation

LLLLKKKK commented Apr 30, 2026

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

LLLLKKKK commented Apr 30, 2026

AI Code Review - PR #962

Blocking Issues

P1

Non-blocking Suggestions

P2

Checklist Violations (5 fail / 23 total)

Strengths

Uh oh!

LLLLKKKK commented May 1, 2026

AI Code Review - PR #962

Blocking Issues

P1

Non-blocking Suggestions

P3

Checklist Violations (1 fail / 23 total)

Strengths

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

LLLLKKKK commented May 1, 2026

AI Code Review - PR #962

Blocking Issues

P1

Non-blocking Suggestions

P2

Checklist Violations (4 fail / 23 total)

Strengths

Uh oh!

LLLLKKKK commented May 1, 2026

AI Code Review - PR #962

Blocking Issues

P0

Non-blocking Suggestions

P2

P3

Checklist Violations (2 fail / 23 total)

Strengths

Uh oh!

LLLLKKKK commented May 1, 2026

AI Code Review - PR #962

Blocking Issues

P0

P1

Non-blocking Suggestions

P2

Checklist Violations (6 fail / 23 total)

Strengths

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

LLLLKKKK commented May 1, 2026

AI Code Review - PR #962

Non-blocking Suggestions

P2

P3

Checklist Violations (3 fail / 23 total)

Strengths

Uh oh!

wht21 commented May 1, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview