Skip to content

feat(deps): unify pip deps via PEP 503 indexes + thin requirements#962

Open
LLLLKKKK wants to merge 1 commit intomainfrom
feature/pip_unify_v2
Open

feat(deps): unify pip deps via PEP 503 indexes + thin requirements#962
LLLLKKKK wants to merge 1 commit intomainfrom
feature/pip_unify_v2

Conversation

@LLLLKKKK
Copy link
Copy Markdown
Collaborator

Summary

Switches the build to a single set of PEP 503 indexes so the same wheel filenames and versions resolve cleanly across all platforms.

  • pip.bzl: thin pip-tools index list (download.pytorch.org per-CUDA, aliyun PyPI mirror, plus a public OSS bucket for the custom flash_attn / deep_ep / deep_gemm / flashinfer / rtp_kernel wheels we publish)
  • requirements_*.txt: thin per-platform inputs that share requirements_base.txt
  • lockfiles regenerated against the unified indexes

Test plan

  • cuda12_9 bazel build passes against the new lockfile
  • CI on this PR

Copilot AI review requested due to automatic review settings April 30, 2026 12:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Unifies Python dependency resolution by switching from direct wheel URLs to package/version pins intended to be served via PEP 503 “simple” indexes, and wires Bazel pip parsing to use a shared index list across platforms.

Changes:

  • Replaced many direct wheel URLs in per-platform deps/requirements_*.txt with pinned package names/versions.
  • Expanded deps/pip.bzl PIP_EXTRA_ARGS to include the unified set of extra PEP 503 indexes (rtp-opensource simple + PyTorch per-ABI indexes + mirror).
  • Added a stable WORKSPACE repository alias (rtp_opensource_deps) intended to keep a handle to opensource deps/ when rtp_deps is overridden internally.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
deps/requirements_torch_gpu_cuda12_9.txt Switches CUDA 12.9 GPU deps from direct URLs to pinned names/versions.
deps/requirements_torch_gpu_cuda12.txt Switches CUDA 12.6 GPU deps from direct URLs to pinned names/versions.
deps/requirements_torch_cpu.txt Updates CPU Torch pin and removes direct wheel URLs.
deps/requirements_rocm.txt Switches ROCm deps to pinned names/versions and adds amd-smi.
deps/requirements_cuda12_arm.txt Switches CUDA12 ARM deps from direct URLs to pinned names/versions.
deps/requirements_cpu_arm.txt Replaces direct ARM CPU Torch wheel URL with a version pin.
deps/pip.bzl Centralizes pip index configuration via a shared extra-index list.
deps/http.bzl Adds TODO commentary about consolidating duplicated http_archive entries.
WORKSPACE Adds rtp_opensource_deps local_repository alias for internal override scenarios.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread deps/http.bzl Outdated
# Consolidate here as multi-URL lists (artlab mirror first, public URL fallback);
# requires per-config `bazel build` verification before landing to catch URL
# availability regressions.
# See /home/liukan.lk/.claude/plans/serialized-wibbling-snail.md Phase 5.
@LLLLKKKK
Copy link
Copy Markdown
Collaborator Author

AI Code Review - PR #962

Status: BLOCKING

Summary: P0/0 · P1/3 · P2/4 · P3/0

Blocking Issues

P1

  • deps/http.bzl 注释泄漏内源个人开发机绝对路径与 Claude plan 文件名 @ deps/http.bzl:11
    • 建议:删除该行,或仅保留中性描述(如 # See internal_source pip_unify Phase 5 plan),把 plan 实体放到 internal_source/ 文档里
  • lockfile 与新 requirements 不一致;多个平台 pip_parse 仍解析旧版本 @ deps/requirements_torch_gpu_cuda12.txt:9
    • 建议:对所有受影响平台重新生成 lockfile(cuda12/cuda12_arm/rocm/torch_cpu/torch_arm)并提交;至少在本地跑通各平台 bazel build //... 解析阶段,验证 lockfile 与 input 一致
  • cuda12 requirements 移除 nvidia-nvshmem-cu12 但 deep-ep 仍依赖 NVSHMEM 运行时 @ deps/requirements_torch_gpu_cuda12.txt:6
    • 建议:在 deep-ep 1.2.1.10 wheel 上验证是否自带 libnvshmem;如未自带,恢复 nvidia-nvshmem-cu12==3.4.5 显式依赖并补 cuda12 平台 MoE smoke 验证

Non-blocking Suggestions

P2

  • cuda12_arm torchvision==0.24.0 未带 local label,可能解析到 CPU/错误平台 wheel @ deps/requirements_cuda12_arm.txt:7
    • 建议:改写为 torchvision==0.24.0+cu129 或保留显式 wheel URL;lockfile regen 后 grep 确认 wheel URL 包含 cu129aarch64
  • cuda12 flashinfer-python 由 0.2.5 升至 0.6.0 缺少 caller 侧验证 @ deps/requirements_torch_gpu_cuda12.txt:9
    • 建议:在 cuda12 smoke 套件(mha/mla/MoE)跑通后再合入;或在 PR description 列明已确认的 caller-side 适配状态
  • input requirements 失去显式版本 pin,未来 lockfile regen 可能引入意外升级 @ deps/requirements_torch_gpu_cuda12.txt:2
    • 建议:在 input requirements 保留下限/精确 pin(如 autoawq>=0.2.9apache-tvm-ffi==0.1.1),与 lockfile 共同形成双层防线
  • PR test plan 仅覆盖 cuda12_9,其余 4 个平台 build 未本地验证 @ deps/pip.bzl:10
    • 建议:合入前至少补跑 cuda12/rocm/cpu_arm bazel build;或在 PR description 明确接受由 CI 完整验证再 merge

Checklist Violations (5 fail / 23 total)

General Principles Checklist

  • [6.1] Architecture — 分层边界:新概念在正确层级,不泄漏内部 → issue deps/http.bzl 注释泄漏内源个人开发机绝对路径与 Claude plan 文件名
    deps/http.bzl:11 注释包含一个本地家目录下 .claude/plans/ 个人 plan markdown 文件名,把内源/个人开发环境路径泄漏到 opensource deps/,违反 layering 边界
  • [6.1] Architecture — 兼容性:公开 API/持久数据/配置/环境迁移安全 → issue lockfile 与新 requirements 不一致;多个平台 pip_parse 仍解析旧版本
    requirements 升级了 deep-ep/flashinfer-python 等多个核心 wheel 同时删除 nvidia-nvshmem-cu12,但 lockfile 未同步重生;下游 caller 适配未验证
  • [6.1] Tests — 新逻辑有聚焦单测 + 相关集成/smoke 测试 → issue PR test plan 仅覆盖 cuda12_9,其余 4 个平台 build 未本地验证
    PR test plan 仅勾选 cuda12_9 bazel build,cuda12/cuda12_arm/rocm/cpu/cpu_arm 五个 pip_parse 路径均未本地验证;CI box 未勾
  • [6.1] Tests — 分布式/跨平台变更有对应覆盖 → issue PR test plan 仅覆盖 cuda12_9,其余 4 个平台 build 未本地验证
    diff 同时影响 5 个平台 lockfile 解析行为,但 test plan 只覆盖 cuda12_9 一个平台 build
  • [6.1] Quality — Commit 原子、message 与行为匹配 → issue lockfile 与新 requirements 不一致;多个平台 pip_parse 仍解析旧版本
    commit message 写 lockfiles regenerated against the unified indexes,但实际 diff 未包含任何 requirements_lock*.txt 文件,message 与行为不符_

Strengths

  • pip.bzl 顶部新增 index 来源注释,把 download.pytorch.org / rtp-opensource / aliyun 三类源的角色与覆盖范围说清,便于后续维护
  • WORKSPACE 新增 rtp_opensource_deps local_repository 时附注释,解释它与 rtp_deps + .internal_bazelrc --override_repository 的关系,避免后续误改
  • 把多平台 requirements 从直链 wheel URL 切换成 PEP 503 name==version+local 形式,降低 wheel URL 变更带来的维护成本

@LLLLKKKK
Copy link
Copy Markdown
Collaborator Author

LLLLKKKK commented May 1, 2026

AI Code Review - PR #962

Status: BLOCKING

Summary: P0/0 · P1/1 · P2/0 · P3/1

Blocking Issues

P1

  • 重新生成的 lockfile 中泄漏开发者绝对路径 @ deps/requirements_lock_torch_gpu_cuda12_9.txt:143
    • 建议:重新执行 update_pip.sh 时确保 cwd 为 github-opensource/ 或对 uv pip compile 显式传入 --directory / 相对源文件路径,使生成的 # via -r ... 注释为相对路径;落库前 grep home/ 校验。否则在另一台机器上跑 bazel run //deps:requirements_*.update 会得到不同的 diff,破坏可重现性,并将开发者用户名(可识别员工身份)固化进开源 lockfile。

Non-blocking Suggestions

P3

  • apache-tvm-ffi 在不同 lockfile 间版本漂移 @ deps/requirements_lock_cuda12_arm.txt:159
    • 建议:统一在 requirements_base.txt 或对应 thin requirements 中显式 pin apache-tvm-ffi 版本,避免不同平台 lockfile 因 resolver 选择差异引入版本漂移;若刻意保留差异请补注释。

Checklist Violations (1 fail / 23 total)

General Principles Checklist

  • [6.1] Quality — 逻辑变更未混入无关格式化 → issue 重新生成的 lockfile 中泄漏开发者绝对路径
    4 个 lockfile 中 # via 注释从 requirements_base.txt 等仓库相对路径变成了带开发者用户名的绝对路径片段(约 262 处),属于无关噪声混入版本化产物,且会随生成机器变化而 diff,破坏可重现性。

Strengths

  • pip.bzl/PIP_EXTRA_ARGS 顶部新增的 strict-separation 注释清晰说明了 --index-url vs --extra-index-url 的语义、artlab 不可外露的硬约束、SJTU mirror 的 301→403 兼容性陷阱以及为何选择 download.pytorch.org,便于后续维护者判断添加新 index 的影响。
  • BUILD 文件保留 using_arm/using_cpu 作为 label-only config_setting 时附带的注释说明了 select() 分支仍然引用这些 label 但永远不再 match,避免后续维护者误删导致下游 BUILD 出现未解析 label。
  • WORKSPACE 引入 rtp_opensource_deps 作为稳定句柄,并解释其与 rtp_deps(被 internal_source 通过 --override_repository 替换)的关系,让 internal overlay 仍能 -r 包含开源 requirements 与共享 http_archive。
  • commit message 完整记录了被丢弃的 SJTU mirror 实验、CPU/ARM build 路径删除清单及对应清理点,方便后续 phase 5(http_archive 多 URL 合并)回溯。

Copilot AI review requested due to automatic review settings May 1, 2026 08:00
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Bazel Python dependency flow to resolve wheels consistently across platforms by switching to a unified set of PEP 503 indexes (Aliyun PyPI mirror + download.pytorch.org per-accelerator + an OSS “simple/” index for custom wheels), and regenerates lockfiles accordingly.

Changes:

  • Reworked deps/pip.bzl to use a consolidated index list and simplified (thin) requirements inputs, with lockfiles regenerated by uv.
  • Removed CPU-only and ARM-CPU pip/lock flows and Bazel configs, and adjusted default dependency selection to fall through to CUDA 12.9.
  • Regenerated CUDA12.9 / ROCm / CUDA12 ARM lockfiles to match the unified index configuration and new “thin requirements” approach.

Reviewed changes

Copilot reviewed 16 out of 19 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
deps/requirements_torch_gpu_cuda12_9.txt Converts CUDA12.9 requirements from direct wheel URLs to pinned packages + PEP 503/simple URLs where needed.
deps/requirements_rocm.txt Updates ROCm requirements to use unified indexes and thin inputs.
deps/requirements_cuda12_arm.txt Updates CUDA12 ARM requirements to the new thin input style and unified indexes.
deps/requirements_lock_torch_gpu_cuda12_9.txt Regenerated CUDA12.9 lockfile via uv, embedding unified index configuration.
deps/requirements_lock_rocm.txt Regenerated ROCm lockfile via uv, embedding unified index configuration.
deps/requirements_lock_cuda12_arm.txt Regenerated CUDA12 ARM lockfile via uv, embedding unified index configuration.
deps/pip.bzl Centralizes pip index config and updates pip_parse repos to the new lockfiles.
deps/BUILD Removes deprecated compile targets and keeps only CUDA12.9 / ROCm / CUDA12 ARM compile targets.
deps/http.bzl Removes CPU torch archives and adds a TODO about consolidating duplicate http_archives across open/internal.
arch_config/arch_select.bzl Drops CPU/ARM-CPU branches and changes default selection to CUDA12.9.
WORKSPACE Adds rtp_opensource_deps and removes CPU/ARM-CPU pip repo installs.
BUILD Notes that using_arm / using_cpu config_settings are now label-only and never match.
.bazelrc Removes --config=cpu and --config=arm build configs.
Comments suppressed due to low confidence (2)

deps/pip.bzl:46

  • pip_deps() no longer declares a pip_parse repo for CUDA12 (cu126), but arch_config/arch_select.bzl still references @pip_gpu_cuda12_torch for cuda_pre_12_9. This will break --config=cuda12 (and any other path that needs the cu126 Python deps). Restore a pip_gpu_cuda12_torch pip_parse with a cu126 lockfile, or remove/retarget the cuda_pre_12_9 dependency path so it doesn’t reference a missing repo.
def pip_deps():
    pip_parse(
        name = "pip_ppu_torch",
        requirements_lock = "@rtp_deps//:requirements_lock_torch_gpu_cuda12_9.txt",
        python_interpreter = "/opt/conda310/bin/python3",
        extra_pip_args = PIP_EXTRA_ARGS,
        timeout = 3600,
    )

    pip_parse(
        name = "pip_gpu_cuda12_9_torch",
        requirements_lock = "@rtp_deps//:requirements_lock_torch_gpu_cuda12_9.txt",
        python_interpreter = "/opt/conda310/bin/python3",
        extra_pip_args = PIP_EXTRA_ARGS,
        timeout = 3600,
        quiet = False,
    )

WORKSPACE:56

  • WORKSPACE no longer loads/calls pip_gpu_cuda12_torch_install_deps(), but the repo is still referenced from arch_config/arch_select.bzl (@pip_gpu_cuda12_torch). If CUDA12 (cu126) remains supported, pip_deps() and WORKSPACE both need to create/install that repo; if it’s being dropped, please also remove the remaining pip_gpu_cuda12_torch references and associated select branches/configs so workspace loading can’t fail.
load("@rtp_deps//:pip.bzl", "pip_deps")

pip_deps()

load("@pip_ppu_torch//:requirements.bzl", pip_ppu_torch_install_deps = "install_deps")
pip_ppu_torch_install_deps()

load("@pip_gpu_cuda12_9_torch//:requirements.bzl", pip_gpu_cuda12_9_torch_install_deps = "install_deps")
pip_gpu_cuda12_9_torch_install_deps()

load("@pip_cuda12_arm_torch//:requirements.bzl", pip_cuda12_arm_torch_install_deps = "install_deps")
pip_cuda12_arm_torch_install_deps()

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread deps/http.bzl Outdated
Comment on lines +10 to +11
# availability regressions.
# See /home/liukan.lk/.claude/plans/serialized-wibbling-snail.md Phase 5.
Comment thread arch_config/arch_select.bzl Outdated
# to wrapper target relate with different system config
load("@pip_cpu_torch//:requirements.bzl", requirement_cpu="requirement")
load("@pip_arm_torch//:requirements.bzl", requirement_arm="requirement")
load("@pip_gpu_cuda12_torch//:requirements.bzl", requirement_gpu_cuda12="requirement")
@LLLLKKKK
Copy link
Copy Markdown
Collaborator Author

LLLLKKKK commented May 1, 2026

AI Code Review - PR #962

Status: BLOCKING

Summary: P0/0 · P1/2 · P2/2 · P3/0

Blocking Issues

P1

  • deps/http.bzl 注释泄漏开发者本地路径与内部 plan 文件名 @ deps/http.bzl:10
    • 建议:删除该 See ... 行,或改为指向公开可访问的 issue/PR 编号(如 # See PR #962 / linked tracking issue)。
  • 4 个 lockfile # via 注释包含 github-opensource/deps/ 路径前缀(cwd 错误重生) @ deps/requirements_lock_torch_gpu_cuda12_9.txt:140
    • 建议:在 github-opensource/ 子目录内重新生成所有 4 个 lockfile(或在 compile_pip_requirementsextra_args 中显式设置工作目录),确保 # via 注解为 -r requirements_base.txt 的相对路径。

Non-blocking Suggestions

P2

  • pip_ppu_torch 切换至 cuda12_9 lockfile,PPU 平台 torch 版本随之改变 @ deps/pip.bzl:30
    • 建议:在合入前于 PPU 平台跑一次 bazel build 与基础 smoke,确认 pip_ppu_torch 切换后所有 requirement(...) 的 transitive 仍可解析、且 PPU 路径未被卷入 cu129 torch 二进制依赖;如 PPU 必须留在 torch 2.6,应在 internal arch_select 中显式区分。
  • WORKSPACE 新增 rtp_opensource_deps 别名但本 PR 内未见消费方 @ WORKSPACE:21
    • 建议:在本 PR 描述/commit message 中链接 internal 侧使用 @rtp_opensource_deps 的具体位置;若 internal 修改尚未提交,建议拆分待 internal 侧 ready 后一次性提交,避免引入暂时不被引用的 repository 别名。

Checklist Violations (4 fail / 23 total)

General Principles Checklist

  • [6.1] Software Engineering — KISS/YAGNI:无投机性抽象 → issue WORKSPACE 新增 rtp_opensource_deps 别名但本 PR 内未见消费方
    WORKSPACE 新增 rtp_opensource_deps local_repository 别名,但本 PR diff 中没有任何 @rtp_opensource_deps//... 引用,注释承诺的 internal overlay 消费方未在本 PR 出现。
  • [6.1] Architecture — 分层边界:新概念在正确层级,不泄漏内部 → issue deps/http.bzl 注释泄漏开发者本地路径与内部 plan 文件名
    deps/http.bzl 新增注释引入开发者本地家目录绝对路径与内部 planning 文件名,作为内部状态泄漏到 opensource 源码注释中。
  • [6.1] Architecture — 兼容性:公开 API/持久数据/配置/环境迁移安全 → checklist-only
    --config=cpu--config=arm 从 .bazelrc 删除后,仍引用这些 config 的下游用户在 build 时不会立刻报错,会回落到 default(现已切到 cu129)。BUILD 注释说明保留 using_arm 等 label-only config_setting 是有意为之,回退路径有文档化。
  • [6.1] Tests — 分布式/跨平台变更有对应覆盖 → issue pip_ppu_torch 切换至 cuda12_9 lockfile,PPU 平台 torch 版本随之改变
    pip_ppu_torch 切换 lockfile 后 PPU 平台的 torch 解析从 cu126/torch2.6 跳到 cu129/torch2.8,PR 描述未列出 PPU CI 验证;ROCm/cuda12_arm 默认走重生 lockfile 也需对应平台 CI 校验。

Strengths

  • 每处删除/默认值切换都附带清晰的 rationale 注释,明确说明回退指引(如 BUILD 中保留 label-only using_arm 的原因、arch_select.bzl//conditions:default 切换到 cu129 的说明)
  • PIP_EXTRA_ARGS 注释明确解释 --index-url 必须覆盖容器 env PIP_INDEX_URL(避免 opensource 侧静默走内网 artlab),strict separation 规则写得很清楚
  • http.bzl 顶部的 TODO 注释列出 Phase 5 待合并的 http_archive 列表与多 URL 回退策略,方便后续 follow-up

@LLLLKKKK LLLLKKKK force-pushed the feature/pip_unify_v2 branch from ef772e7 to e4f6072 Compare May 1, 2026 10:27
@LLLLKKKK
Copy link
Copy Markdown
Collaborator Author

LLLLKKKK commented May 1, 2026

AI Code Review - PR #962

Status: BLOCKING

Summary: P0/1 · P1/0 · P2/1 · P3/1

Blocking Issues

P0

  • open-source arch_select.bzl 仍 load @pip_gpu_cuda12_torch 但 pip_parse 已删除,开源端 workspace 评估会失败 @ arch_config/arch_select.bzl:4
    • 建议:二选一:(1) 在 arch_select.bzl 移除 requirement_gpu_cuda12 的 load,并将 cuda_pre_12_9 分支并入 using_cuda12_9_x86 或 default;(2) 若仍需保留 cuda12 (CUDA 12.4/12.6) 开源构建路径,则在 deps/pip.bzl 重新加上 pip_parse(name="pip_gpu_cuda12_torch", ...) 并在 WORKSPACE 调用 install_deps。建议先验证 bazel build --config=cuda12_9 //:th_transformerbazel build --config=cuda12 //:th_transformer 在纯开源 worktree 都能通过。

Non-blocking Suggestions

P2

  • BUILD 保留 using_arm / using_cpu config_setting 但已无任何 build:config 设置对应 define,select 分支静默走默认 @ BUILD:62
    • 建议:在保留注释的同时,给 using_arm/using_cpudeprecation = "..." 属性或在注释中给出清理跟踪项(issue 链接或 follow-up PR),并跨仓库 grep 一遍 using_arm / using_cpu 引用,把已知不再生效的分支也一并删掉。

P3

  • PIP_EXTRA_ARGS 用 --index-url 覆盖环境变量,行为变更值得在 PR 描述/迁移说明中更醒目地标注 @ deps/pip.bzl:1
    • 建议:在 PR description 或 deps/ 下任何已有 README/HOWTO 中加一段:列出当前 5 个 index、强调不使用内部 mirror、给出 bazel run //deps:requirements_<cfg>.update 的执行约束。

Checklist Violations (2 fail / 23 total)

General Principles Checklist

  • [6.1] Architecture — 兼容性:公开 API/持久数据/配置/环境迁移安全 → issue open-source arch_select.bzl 仍 load @pip_gpu_cuda12_torch 但 pip_parse 已删除,开源端 workspace 评估会失败
    arch_config/arch_select.bzl 仍 load @pip_gpu_cuda12_torch,但 deps/pip.bzl 删除对应 pip_parse、WORKSPACE 删除 install_deps,开源仓库任意 --config 构建会因 unresolved repository 直接失败,破坏开源端兼容性。
  • [6.1] Quality — PR description 说明动机与设计 → issue PIP_EXTRA_ARGS 用 --index-url 覆盖环境变量,行为变更值得在 PR 描述/迁移说明中更醒目地标注
    deps/pip.bzl 把 PIP_INDEX_URL 改为 --index-url 强覆盖、新增 4 个 download.pytorch.org extra-index-url,是会影响所有后续 lock 再生流程的策略变更,建议在 PR description / deps README 中显式标注 5 个 index 列表与执行约束。

Strengths

  • lock 文件按 platform 分别更新(cuda12 / cuda12_9 / rocm / arm / cpu),并把 torch 安装源从 aliyun mirror 切换到 download.pytorch.org,便于复现 wheel 链接。
  • deps/http.bzl 与 deps/pip.bzl 注释清晰描述了为什么改用 --index-url 覆盖容器 env 以及为什么 mirror 不可用,方便后续维护者理解决策背景。

@LLLLKKKK LLLLKKKK force-pushed the feature/pip_unify_v2 branch from e4f6072 to 735509e Compare May 1, 2026 13:35
@LLLLKKKK
Copy link
Copy Markdown
Collaborator Author

LLLLKKKK commented May 1, 2026

AI Code Review - PR #962

Status: BLOCKING

Summary: P0/1 · P1/2 · P2/2 · P3/0

Blocking Issues

P0

  • arch_select.bzl 仍 load 已删除的 @pip_gpu_cuda12_torch @ arch_config/arch_select.bzl:3
    • 建议:同步删除 arch_select.bzl 中 @pip_gpu_cuda12_torch 的 load 与 cuda_pre_12_9 → requirement_gpu_cuda12 分支;如需保留 cu126 入口,请恢复 deps/pip.bzl 中对应的 pip_parse 与 WORKSPACE install_deps。落库前在开源 workspace 跑一次 bazel query //... --config=cuda12_9 验证 arch_select.bzl 可被加载。

P1

  • pip_ppu_torch 锁文件由 cuda12 切到 cuda12_9,torch 跨大版本升级 @ deps/pip.bzl:25
    • 建议:明确 PPU 的目标 torch/CUDA 版本。如果 PPU 仍需 cu126,请保留 requirements_lock_torch_gpu_cuda12.txt 与对应 lockfile;如果确实要升级,请在 PPU 容器中验证一次 bazel build + 启动推理 smoke,并在 PR description 注明影响范围。
  • nvidia-cutlass-dsl 由 4.4.1 回退到 4.3.5 @ deps/requirements_lock_torch_gpu_cuda12_9.txt:2247
    • 建议:在 PR description 说明回退到 4.3.5 的原因(是否 4.4.1 触发了已知 bug,或是为了与 flashinfer 0.6.6 兼容)。建议触发一次 cuda12_9 的 perf/smoke 验证 GEMM/MoE 路径未回归。

Non-blocking Suggestions

P2

  • flash-mla 在 cuda12_arm 与 cuda12_9 锁文件中 commit 不一致 @ deps/requirements_cuda12_arm.txt:9
    • 建议:统一两侧 commit(推荐两侧都用最新的 ca58fed),或在 lockfile/上层 README 记录“arm 暂留旧 commit 的原因 + 跟踪 issue”。否则后续 perf/正确性 diff 时无法快速定位差异来源。
  • 开源 .bazelrc 移除 build:cpu/build:arm 但 BUILD 中保留 config_setting 未落地清理 @ BUILD:62
    • 建议:在 PR description 列出后续清理计划(rtp_llm/BUILD、barex_rdma/BUILD 中 :using_arm/:using_cpu select 分支何时移除),或者直接在本 PR 内顺手清理这些 select 分支,避免长期 dead code。

Checklist Violations (6 fail / 23 total)

General Principles Checklist

  • [6.1] Architecture — 依赖方向:无循环依赖/跨层惊喜 → issue arch_select.bzl 仍 load 已删除的 @pip_gpu_cuda12_torch
    arch_select.bzl 仍 load @pip_gpu_cuda12_torch,但 deps/pip.bzl 不再注册该外部仓库,导致开源 bazel 加载阶段直接失败 —— 出现了一个不存在的依赖方向。
  • [6.1] Architecture — 兼容性:公开 API/持久数据/配置/环境迁移安全 → issue pip_ppu_torch 锁文件由 cuda12 切到 cuda12_9,torch 跨大版本升级
    PPU 锁文件由 cu126/torch 2.6 切到 cu129/torch 2.8,跨 CUDA minor 版本,PPU 容器 CUDA 12.3 base 兼容性未经验证;同时 nvidia-cutlass-dsl 由 4.4.1 回退到 4.3.5。两个变更都属于隐式公开依赖迁移,需要在 PR description 评估对存量构建/部署影响。
  • [6.1] Tests — 新逻辑有聚焦单测 + 相关集成/smoke 测试 → issue arch_select.bzl 仍 load 已删除的 @pip_gpu_cuda12_torch
    PR 改了 .bazelrc / WORKSPACE / pip.bzl / arch_select.bzl / 7 份 lockfile,但没有附带任何 bazel build 验证证据;http.bzl 的 TODO 注释自己也写了“requires per-config bazel build verification before landing”。
  • [6.1] Tests — 分布式/跨平台变更有对应覆盖 → issue flash-mla 在 cuda12_arm 与 cuda12_9 锁文件中 commit 不一致
    PR 涉及 cuda12 / cuda12_9 / cuda12_9_arm / rocm / ppu 多个平台 lockfile,但 PR 中没有任何跨平台 bazel build 验证;flash-mla 在 arm/x86 commit 不一致也没有跨平台一致性校验。
  • [6.1] Quality — Mega-PR 已拆分为独立变更 → issue pip_ppu_torch 锁文件由 cuda12 切到 cuda12_9,torch 跨大版本升级
    PR 同时做了:(a) 删除 cpu/arm build configs,(b) 重构 pip 索引/lockfile 格式,(c) PPU torch 大版本升级,(d) cutlass-dsl 版本回退。其中 (c)(d) 与 pip_unify 主题正交,可以拆出来单独评审/回滚。
  • [6.1] Quality — PR description 说明动机与设计 → issue 开源 .bazelrc 移除 build:cpu/build:arm 但 BUILD 中保留 config_setting 未落地清理
    PR 没有解释 (a) cutlass-dsl 4.4.1→4.3.5 的回退原因,(b) PPU 切换 cu129 是否经过验证,(c) 后续 BUILD 中 dead select 分支的清理计划。这些都属于读者需要的关键 why。

Strengths

  • 新增 rtp_opensource_deps local_repository 让内部 overlay 复用开源 lockfile,避免内外两份 -r wrapper 漂移
  • PIP_EXTRA_ARGS 显式 --index-url 覆盖容器 PIP_INDEX_URL,注释清楚解释了 artlab 隔离规则
  • BUILD/WORKSPACE/pip.bzl/http.bzl 的删除点都附带原因注释(CPU/ARM 路径退役、SJTU mirror 不可用),方便未来回溯
  • 切换到 PyPI semver pin(替换裸 URL)让锁文件可读、可被 uv 复用,并使 hash 校验更严格

Copilot AI review requested due to automatic review settings May 1, 2026 15:28
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Bazel/Python dependency workflow to resolve wheels from a unified set of PEP 503 indexes, switching to “thin” per-platform requirement inputs (shared requirements_base.txt) and regenerated lockfiles so the same filenames/versions resolve consistently across environments.

Changes:

  • Standardize pip index configuration via deps/pip.bzl (--index-url + PEP 503 --extra-index-url list).
  • Replace direct wheel URLs in requirement inputs with pinned package specs and PEP 503/simple-based references; regenerate lockfiles accordingly.
  • Remove legacy CPU/CUDA12(cu126) requirement sources/targets and adjust Bazel selects and WORKSPACE wiring.

Reviewed changes

Copilot reviewed 16 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
deps/pip.bzl Defines unified pip index configuration and the remaining pip_parse repos.
deps/BUILD Exports per-config requirements files and removes compile targets for dropped configs.
deps/http.bzl Removes CPU torch http_archives; adds TODOs for future consolidation with internal overlay.
deps/requirements_torch_gpu_cuda12_9.txt Converts to pinned “thin” input requirements for cu129 x86_64.
deps/requirements_lock_torch_gpu_cuda12_9.txt Regenerated lockfile using unified index set.
deps/requirements_rocm.txt Updates ROCm requirements to use pinned specs and unified indexes.
deps/requirements_lock_rocm.txt Regenerated ROCm lockfile using unified index set.
deps/requirements_cuda12_arm.txt Adds pinned CUDA12-arm requirements aligned with unified indexes.
deps/requirements_lock_cuda12_arm.txt Regenerated CUDA12-arm lockfile using unified index set.
arch_config/arch_select.bzl Removes CPU/ARM-CPU pip repos, changes default Python deps selection, and adjusts torch deps selection.
WORKSPACE Adds rtp_opensource_deps and removes some pip repo installs; still loads @pip_ppu_torch.
BUILD Keeps using_arm/using_cpu config_settings as “label-only” placeholders.
.bazelrc Removes --config=cpu and --config=arm sections.
deps/requirements_torch_gpu_cuda12.txt (deleted) Removes legacy cu126 GPU requirements input.
deps/requirements_torch_cpu.txt (deleted) Removes legacy CPU torch requirements input.
deps/requirements_cpu_arm.txt (deleted) Removes legacy ARM CPU requirements input.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread WORKSPACE
Comment on lines 49 to 51
load("@pip_ppu_torch//:requirements.bzl", pip_ppu_torch_install_deps = "install_deps")
pip_ppu_torch_install_deps()

Comment thread arch_config/arch_select.bzl Outdated
Comment on lines +2 to +9
load("@pip_gpu_cuda12_9_torch//:requirements.bzl", requirement_gpu_cuda12_9="requirement")
load("@pip_gpu_rocm_torch//:requirements.bzl", requirement_gpu_rocm="requirement")
load("@rtp_llm//bazel:defs.bzl", "copy_so")

# cuda12 (cu126/torch2.6) was dropped — pip_gpu_cuda12_torch no longer registered
# in deps/pip.bzl. Remaining `cuda_pre_12_9` select branches resolve via cuda12_9.
requirement_gpu_cuda12 = requirement_gpu_cuda12_9

@@ -77,26 +80,16 @@ def torch_deps():
"@torch_rocm//:torch",
"@torch_rocm//:torch_libs",
],
@LLLLKKKK
Copy link
Copy Markdown
Collaborator Author

LLLLKKKK commented May 1, 2026

AI Code Review - PR #962

Status: LGTM

Summary: P0/0 · P1/0 · P2/1 · P3/1

lgtm ready to ci

Non-blocking Suggestions

P2

  • cuda_pre_12_9 路径 torch native 与 Python wheel ABI 不一致 @ arch_config/arch_select.bzl:8
    • 建议:彻底清理 cu126 路径:从 .bazelrc 删除 build:cuda12 ... 一组配置,从 deps/http.bzl 删除 torch_2.6_py310_cuda,从 arch_select.bzl 删除 cuda_pre_12_9 select 分支与 requirement_gpu_cuda12 别名;或保留 cuda12 配置则需同时新建 cu126 对应的 pip_parse + lockfile,避免出现 native/Python 版本错配的中间态。当前 inline 注释只承认意图,未实际收敛。

P3

  • 保留 using_arm/using_cpu config_setting 但移除对应 --config,select 静默走 default @ BUILD:62
    • 建议:将该 config_setting 标记为 deprecated(BUILD 注释 + 引用处一并移除 select 分支),或在 .bazelrc 增加显式禁用提示;当前注释只解释为何保留,未提示该 label 在 select 中已是死分支。

Checklist Violations (3 fail / 23 total)

General Principles Checklist

  • [6.1] Architecture — 错误语义:fail-fast/retry/fallback/silent 行为显式 → issue 保留 using_arm/using_cpu config_setting 但移除对应 --config,select 静默走 default
    BUILD:62-67 说明 using_arm/using_cpu config_setting 因仍被 select 引用而保留,但对应的 --config=arm/cpu 已删除:用户若按旧文档传入 --define=using_arm=true,select 分支永不命中,静默走 default 而非显式报错。
  • [6.1] Architecture — 兼容性:公开 API/持久数据/配置/环境迁移安全 → issue cuda_pre_12_9 路径 torch native 与 Python wheel ABI 不一致
    arch_config/arch_select.bzl:8 requirement_gpu_cuda12 = requirement_gpu_cuda12_9 使 cu126 路径 Python wheel 静默切换到 cu129;但 .bazelrc 中 --config=cuda12 仍存在、torch_deps 仍加载 torch_2.6_py310_cuda,形成 native/Python ABI 不一致。原 cuda12 用户若仍走该 config 会获得运行时不可用的混合栈。
  • [6.1] Tests — 分布式/跨平台变更有对应覆盖 → checklist-only
    PR 同时影响 cuda12_9 / cuda12_arm / rocm 三个 platform 的 lockfile,但未在 PR 描述中给出 cuda12_arm 与 rocm 的本地或 CI 验证证据;尤其 cuda_pre_12_9 fallback 在 cuda12 build 下未明确验证。

Strengths

  • 每处删除/重定向均附 inline 注释说明动机(pip.bzl 拒绝 artlab、SJTU 镜像 403 行为、cuda_pre_12_9 fallback 路径),未来回溯成本低
  • lockfile 同步覆盖 cuda12_arm / rocm / cuda12_9 三个 config,避免 pip-compile→uv 切换出现部分 config 漂移
  • WORKSPACE 同步移除已废弃 install_deps 调用,保持 pip_parse 注册与 install_deps 调用一一对应

@LLLLKKKK LLLLKKKK force-pushed the feature/pip_unify_v2 branch from 0a1800a to 642ed26 Compare May 1, 2026 15:43
@wht21
Copy link
Copy Markdown
Collaborator

wht21 commented May 1, 2026

internal source has been updated, please review the changes!

Copilot AI review requested due to automatic review settings May 1, 2026 16:11
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the Bazel/Python dependency plumbing to use a unified set of PEP 503 indexes (Aliyun PyPI mirror + PyTorch per-platform indexes + RTP OSS “simple” bucket) and shifts the repository’s default CUDA build/test config from cuda12_6 to cuda12_9.

Changes:

  • Replace per-platform “full” requirements with thin inputs and regenerated lockfiles (now generated by uv).
  • Simplify Bazel pip parsing to focus on cuda12_9 + cuda12_arm + ROCm and remove older CPU/CUDA12.6-era branches/configs.
  • Update scripts/docs/test configs to use --config=cuda12_9.

Reviewed changes

Copilot reviewed 30 out of 33 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
rtp_llm/test/perf_test/multi_node/multi_runner.sh Updates default Bazel build args to --config=cuda12_9.
rtp_llm/test/perf_test/multi_node/multi_benchmark_config.yaml Updates benchmark build args to --config=cuda12_9.
rtp_llm/models_py/standalone/BUILD Removes cuda_pre_12_9 select branches.
rtp_llm/models_py/modules/hybrid/test/BUILD Removes cuda_pre_12_9 select branches for test deps.
rtp_llm/models_py/modules/factory/attention/cuda_impl/test/BUILD Removes cuda_pre_12_9 select branches for test deps.
rtp_llm/models_py/modules/factory/attention/cuda_cp_impl/test/BUILD Removes cuda_pre_12_9 select branches for test deps.
rtp_llm/models_py/kernels/cuda/test/BUILD Removes commented cuda_pre_12_9 branch in a disabled test stanza.
rtp_llm/BUILD Removes ARM/CPU-specific select branches and cuda_pre_12_9 branches in Python deps lists.
docs/start/install.md Updates example build command/config to cuda12_9.
docs/references/profiling.md Updates profiling command to cuda12_9.
docs/references/debug.md Updates debug test command to cuda12_9.
docs/benchmark/benchmark.md Updates benchmark doc snippet to cuda12_9.
docs/backend/3fs.md Updates build example to cuda12_9.
deps/requirements_torch_gpu_cuda12_9.txt Converts to normalized names + PEP503 style references/pins for unified resolution.
deps/requirements_torch_gpu_cuda12.txt Removes legacy CUDA12.6 requirements input.
deps/requirements_torch_cpu.txt Removes legacy CPU requirements input.
deps/requirements_rocm.txt Switches ROCm requirements to unified indexes and pinned names/versions.
deps/requirements_lock_torch_gpu_cuda12_9.txt Regenerated lockfile (uv) with unified index configuration and hashes.
deps/requirements_lock_rocm.txt Regenerated lockfile (uv) with unified index configuration and hashes.
deps/requirements_lock_cuda12_arm.txt Regenerated lockfile (uv) for the cuda12_arm config.
deps/requirements_cuda12_arm.txt Defines thin cuda12_arm requirements input (torch/torchvision + custom wheels).
deps/requirements_cpu_arm.txt Removes legacy ARM-CPU requirements input.
deps/pip.bzl Defines unified PIP_EXTRA_ARGS; drops older pip repos and adds pip_cuda12_arm_torch.
deps/http.bzl Removes older torch http_archives; keeps torch 2.8 CUDA wheel and ROCm torch wheel.
deps/BUILD Exports per-config requirement sources and drops legacy compile targets.
arch_config/arch_select.bzl Removes CPU/ARM-CPU requirement routing and cuda_pre_12_9 routing; simplifies torch deps selection.
WORKSPACE Adds rtp_opensource_deps repo and removes legacy pip installs, while keeping CUDA12.9/ROCm/ARM installs.
BUILD.pytorch Removes cuda_pre_12_9 linkopts branch.
BUILD Removes cuda_pre_12_9, using_arm, using_cpu config settings; updates compdb refresh target to cuda12_9.
.bazelrc Removes cuda12_2, cuda12_6, cpu, arm configs; keeps cuda12_9 configs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread WORKSPACE
Comment on lines 49 to 51
load("@pip_ppu_torch//:requirements.bzl", pip_ppu_torch_install_deps = "install_deps")
pip_ppu_torch_install_deps()

@LLLLKKKK
Copy link
Copy Markdown
Collaborator Author

LLLLKKKK commented May 1, 2026

AI Code Review - PR #962

Status: LGTM

Summary: P0/0 · P1/0 · P2/1 · P3/3

lgtm ready to ci

Non-blocking Suggestions

P2

  • requirement() 默认分支让 cuda12_arm 拉到 x86 cuda12_9 wheel @ arch_config/arch_select.bzl:15
    • 建议:在 arch_select.bzl 顶部 load @pip_cuda12_arm_torch//:requirements.bzl 取 requirement_cuda12_arm,并在 requirement()/torch_deps() 里加 @rtp_llm//:using_cuda12_arm 分支路由到该 lockfile;或显式确认 cuda12_arm 不再受支持(同时撤掉 pip_cuda12_arm_torch 注册和 cuda12_9_arm bazelrc/requirements_lock)。本 PR 既保留 cuda12_arm 配置又不接它就是不一致。

P3

  • whl_deps() 注释/默认值与 cuda12_arm lockfile 的 torch 版本不一致 @ arch_config/arch_select.bzl:59
    • 建议:要么把注释订正为 'cuda12_9_x86 uses torch 2.8+cu129; cuda12_9_arm uses torch 2.9+cu129' 并为 cuda12_arm 单独加 select 分支返回 torch==2.9.0+cu129;要么和 cuda12_arm lockfile 对齐版本。
  • rtp-kernel/deep-ep 在 cuda12_9 与 cuda12_arm 锁文件中版本漂移 @ deps/requirements_lock_cuda12_arm.txt:1
    • 建议:在 update_pip 流程里把同一上游 commit 的 wheel 同步发布到两个 index、并一次性 regen 两个 lockfile;或在 PR 描述里明确说明为何允许版本漂移。
  • requirement() 默认下载 cuda12_9 wheel 影响纯 CPU 测试体验 @ arch_config/arch_select.bzl:20
    • 建议:若团队不再支持 CPU/ARM-CPU 路径,可在 .bazelrc 里把 build --config=cuda12_9 设为隐式默认;否则给 default 分支加一个轻量空 fallback 或显式报错,避免无声拉巨型 wheel。

Checklist Violations (3 fail / 25 total)

General Principles Checklist

  • [6.1] Architecture — 错误语义:fail-fast/retry/fallback/silent 行为显式 → issue requirement() 默认分支让 cuda12_arm 拉到 x86 cuda12_9 wheel
    arch_select.bzl 默认分支吞掉了 unknown config(包括 cuda12_arm)静默路由到 cuda12_9(x86)wheel,没有任何 fail-fast 检测,下游会拉到错架构 wheel 才在运行/import 阶段炸开。
  • [6.1] Architecture — 兼容性:公开 API/持久数据/配置/环境迁移安全 → issue requirement() 默认下载 cuda12_9 wheel 影响纯 CPU 测试体验
    --config=cuda12_2/cuda12_6/cpu/arm 这些公开构建入口被一次性删除,外部用户/历史脚本(含已经更新的 docs,但开源用户旧脚本无法感知)pin 到这些 config 会立即失败。PR 描述里未列迁移指南或废弃公告。
  • [6.1] Tests — 分布式/跨平台变更有对应覆盖 → issue requirement() 默认分支让 cuda12_arm 拉到 x86 cuda12_9 wheel
    PR 触及跨架构(x86 / aarch64 / rocm)依赖图,但未在 PR 描述/commit message 中提供 cuda12_9_arm 与 rocm CI build 通过证据,arch_select.bzl 收敛后是否仍能正确路由 cuda12_arm 需独立验证。

Strengths

  • 成片清掉 cuda12_2/cuda12_6/cpu/arm/cpu_latest/cuda_pre_12_9 等死 config 与对应 select 分支,未来 BUILD/select 维护成本显著下降。
  • pip.bzl 顶部的 strict-separation 注释把 artlab、aliyun、download.pytorch.org、rtp-opensource OSS 各自的角色和不能用 SJTU 镜像的原因都讲清楚,决策可追溯。
  • URL-pin → PEP 503 索引迁移使内外源 wheel 来源收敛、缓存可复用,并在 lockfile 改用 uv 生成(信息更紧凑)。
  • 新增 rtp_opensource_deps 句柄和保留必要的 exports_files 显式声明,让 internal_source 通过 --override_repository 复用 opensource 锁文件这条数据流变得显式可读。
  • WORKSPACE 中 pip_ppu_torch stub 注释清楚说明开源-only 构建不会真正下载 PPU wheel,避免外部用户被 PPU 资源阻塞。
  • 文档(3fs/benchmark/debug/profiling/install)和 perf_test 脚本(multi_runner.sh、multi_benchmark_config.yaml)里 cuda12_6 → cuda12_9 同步更新,没留下死引用。

@wht21
Copy link
Copy Markdown
Collaborator

wht21 commented May 1, 2026

internal source has been updated, please review the changes!

LLLLKKKK added a commit that referenced this pull request May 1, 2026
utils/util.py imports aiohttp but the py_library target never declared the
pip dep. Surfaced on ut-sm8x in PR #962 (util_test, duplicated_kv_test →
ModuleNotFoundError: No module named 'aiohttp'). Likely latent for a while
— only visible when cuda12_9_x86 with a strict py_test runfiles sandbox
runs these targets.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 1, 2026 18:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Unifies Bazel/RULES_PYTHON dependency resolution around a single set of PEP 503 indexes (Aliyun PyPI mirror + download.pytorch.org per backend + RTP OSS “simple” index for custom wheels), while also standardizing the repo’s default CUDA build path onto cuda12_9 and regenerating lockfiles accordingly.

Changes:

  • Reworked deps/pip.bzl + requirements inputs/lockfiles to resolve from PEP 503 indexes (and removed legacy CPU / older CUDA12 inputs).
  • Updated Bazel config surface to remove cuda12_6 / CPU / ARM-CPU settings and align scripts/docs/tests on cuda12_9.
  • Added an opensource deps repository handle (rtp_opensource_deps) for internal overlay reuse.

Reviewed changes

Copilot reviewed 30 out of 33 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
rtp_llm/test/perf_test/multi_node/multi_runner.sh Switch default multi-node build/test config from cuda12_6cuda12_9.
rtp_llm/test/perf_test/multi_node/multi_benchmark_config.yaml Updates benchmark build args to cuda12_9.
rtp_llm/models_py/standalone/BUILD Drops cuda_pre_12_9 select branches in standalone deps.
rtp_llm/models_py/modules/hybrid/test/BUILD Removes cuda_pre_12_9 flashinfer deps branch.
rtp_llm/models_py/modules/factory/attention/cuda_impl/test/BUILD Removes cuda_pre_12_9 flashinfer deps branches for tests.
rtp_llm/models_py/modules/factory/attention/cuda_cp_impl/test/BUILD Removes cuda_pre_12_9 flashinfer deps branches for CP tests.
rtp_llm/models_py/kernels/cuda/test/BUILD Removes commented cuda_pre_12_9 select branch.
rtp_llm/BUILD Removes legacy ARM/CPU selects, adds aiohttp to deps list, and keeps CUDA12.9/ARM/ROCm selects.
docs/start/install.md Updates example Bazel build command to cuda12_9.
docs/references/profiling.md Updates profiling example to cuda12_9.
docs/references/debug.md Updates debug/test example to cuda12_9.
docs/benchmark/benchmark.md Updates benchmark doc snippet to cuda12_9.
docs/backend/3fs.md Updates 3FS build example to cuda12_9.
deps/requirements_torch_gpu_cuda12_9.txt Converts to “thin” named requirements resolving via unified indexes.
deps/requirements_torch_gpu_cuda12.txt Removes legacy CUDA12 (cu126) input requirements file.
deps/requirements_torch_cpu.txt Removes legacy CPU torch requirements file.
deps/requirements_rocm.txt Converts ROCm requirements to named deps aligned with unified indexes.
deps/requirements_lock_torch_gpu_cuda12_9.txt Regenerated lockfile using the new index set and inputs.
deps/requirements_lock_rocm.txt Regenerated ROCm lockfile using the new index set and inputs.
deps/requirements_lock_cuda12_arm.txt Regenerated CUDA12 ARM lockfile using the new index set and inputs.
deps/requirements_cuda12_arm.txt New thin CUDA12 ARM requirements source.
deps/requirements_cpu_arm.txt Removes legacy ARM CPU requirements file.
deps/pip.bzl Centralizes index config (explicit --index-url + extra indexes) and streamlines pip_parse repos to CUDA12.9/ARM/ROCm (+ PPU stub).
deps/http.bzl Removes legacy torch http_archives (cpu + cu126), keeps torch 2.8 CUDA archive and other deps.
deps/BUILD Exports per-config requirements sources and removes compile rules for deleted CPU/cu126 variants.
arch_config/arch_select.bzl Simplifies requirement selection toward CUDA12.9/ROCm and drops legacy CPU/cu126 selects.
WORKSPACE Adds rtp_opensource_deps repo handle and updates pip install_deps set (CUDA12.9/ARM/ROCm).
BUILD.pytorch Removes pre-12.9 CUDA linkopts branch.
BUILD Removes cuda_pre_12_9, using_arm, using_cpu config_settings; updates compdb config.
.bazelrc Removes cuda12_2, cuda12_6, cpu, and arm configs; keeps cuda12_9_* and ROCm.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 15 to 22
deps = select({
"@rtp_llm//:cuda_pre_12_9": [requirement_gpu_cuda12(name)],
"@rtp_llm//:using_cuda12_9_x86": [requirement_gpu_cuda12_9(name)],
"@rtp_llm//:using_rocm": [requirement_gpu_rocm(name)],
"@rtp_llm//:using_arm": [requirement_arm(name)],
"//conditions:default": [requirement_cpu(name)],
# Default falls through to cuda12_9 (the canonical x86 GPU build).
# CPU-only and ARM-CPU configs were removed; if you need them,
# restore the corresponding pip_parse + lockfile + select branch.
"//conditions:default": [requirement_gpu_cuda12_9(name)],
}),
"@rtp_llm//:using_cuda12": ["torch==2.6.0+cu126"],
"@rtp_llm//:using_rocm": ["pyrsmi==0.2.0", "amdsmi@https://sinian-metrics-platform.oss-cn-hangzhou.aliyuncs.com/kis%2FAMD%2Famd_smi%2Fali%2Famd_smi.tar", "aiter@https://sinian-metrics-platform.oss-cn-hangzhou.aliyuncs.com/kis/AMD/RTP/aiter-0.1.13.dev14%2Bgfa35072d0.d20260402-cp310-cp310-linux_x86_64.whl"],
"//conditions:default": ["torch==2.1.2"],
# Default covers cuda12_9_x86, cuda12_9_arm (both use torch 2.8+cu129).
@LLLLKKKK
Copy link
Copy Markdown
Collaborator Author

LLLLKKKK commented May 1, 2026

AI Code Review - PR #962

Status: LGTM

Summary: P0/0 · P1/0 · P2/0 · P3/0

lgtm ready to ci

Checklist ✅ (25 items passed)

Strengths

  • 增量修复定位精准:utils/util.py:11 顶层 import aiohttp 但 //rtp_llm:utils py_library 漏声明依赖,本次 commit 仅追加一行 :aiohttp 解决 ModuleNotFoundError,未引入额外耦合。
  • Commit message 写明了根因(dep 漏声明)、触发条件(cuda12_9_x86 严格 py_test runfiles sandbox)与受影响测试(util_test、duplicated_kv_test),便于后续 bisect。

@wht21
Copy link
Copy Markdown
Collaborator

wht21 commented May 1, 2026

internal source has been updated, please review the changes!

…CE/utils fixes

Squashed roll-up of the pip-unification work for the opensource side.

Strict separation invariant (Phase 2):
- Opensource builds may only consume mirrors.aliyun.com/pypi/simple/ +
  rtp-opensource OSS + download.pytorch.org/whl/<cfg>/.
- Internal builds may only consume artlab.alibaba-inc.com + rtp-opensource OSS.
- The rtp-opensource OSS bucket is the SINGLE shared mirror for RTP-LLM
  custom wheels (flash_attn, deep_ep, deep_gemm, flashinfer_*, rtp_kernel,
  etc.) — same URL serves both sides.

Build cfg cleanup:
- Drop cuda12 (cu126/torch2.6) build path entirely; cuda12_9 (cu129/torch2.8)
  is now the canonical x86 GPU build. cuda_pre_12_9 config_setting + every
  select branch that used it removed (latent ABI mismatch: cu126 native libs
  with torch 2.8 Python deps from the cu129 pip_parse).
- Drop using_arm / using_cpu config_settings + their select branches —
  --config=arm / --config=cpu bazelrc entries were removed with the lockfiles
  so the [] branches were unreachable and default branches (decord,
  xfastertransformer_devel) silently applied.
- Drop dead pip_ppu_torch lockfile registration from opensource pip.bzl
  (no using_ppu select branch in opensource arch_select.bzl).
- Drop unused cu126 + cpu --extra-index-url from opensource pip.bzl —
  halves per-package query surface against download.pytorch.org and
  reduces regen timeout flakiness.
- Drop CPU-only and ARM-CPU build paths (requirements_torch_cpu.txt,
  requirements_torch_arm.txt, requirements_cpu_arm.txt) — no bazel cfg
  wires them up anymore.
- Drop build:cuda12_2 / build:cuda12_6 from .bazelrc; retarget multi-node
  perf test scripts/yaml + 5 docs (install/benchmark/debug/profiling/3fs)
  to --config=cuda12_9.
- Delete torch_2.6_py310_cuda http_archive + cuda_pre_12_9 branch in
  BUILD.pytorch linkopts.

Shared-wheel hygiene:
- aarch64 wheels retagged manylinux_2_28_aarch64 (uv refuses bare
  linux_aarch64 under aarch64-manylinux python-platform) and uploaded to OSS
  for direct version-pin.
- URL-pin patched wheels to OUR OSS so resolver can't silently swap to an
  upstream copy with different bytes (flash_attn 2.8.3+cu12torch2.8cxx11abiTRUE,
  flashinfer-python 0.6.6, flashinfer-cubin 0.6.6 — we host patched variants).
- nvidia-cutlass-dsl pinned 4.4.1 (latest on artlab; hash matches across
  artlab + aliyun mirror); pulls libs-base==4.4.1 transitive.

Whl_deps tightening:
- whl_deps() select key changed from :using_cuda12 → :using_cuda12_9_x86 +
  default. The :using_cuda12 key over-matched every CUDA variant
  (cuda12_9 inherits build:cuda12 which sets using_cuda12=true), so cu129
  wheels were baking torch==2.6.0+cu126 into install_requires.

WORKSPACE/pip stub:
- WORKSPACE unconditionally calls pip_ppu_torch_install_deps(). Internal
  builds satisfy that via --override_repository=rtp_deps but opensource
  builds don't — so opensource-only builds errored 'Failed to load
  Starlark extension @pip_ppu_torch//:requirements.bzl'. Register
  pip_ppu_torch in opensource pip.bzl as a lazy alias of the cuda12_9
  lockfile (pip_parse is lazy, no opensource select depends on it, so
  no PPU wheel is ever fetched).

py_library deps:
- //rtp_llm:utils now declares :aiohttp. utils/util.py imports aiohttp
  but the target had no pip dep — surfaced as ModuleNotFoundError on
  ut-sm8x's util_test + duplicated_kv_test once the strict py_test
  runfiles sandbox started enforcing it.

Verified end-to-end on Aone CI pipeline 1346, run 39069908: all 28 jobs
SUCCESS (cuda12_9 / cuda12_9_arm / amd / ppu / frontend builds + ut +
smoke + perf + open_source variants).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@LLLLKKKK LLLLKKKK force-pushed the feature/pip_unify_v2 branch from fe60498 to d0a6fd8 Compare May 2, 2026 01:53
@LLLLKKKK
Copy link
Copy Markdown
Collaborator Author

LLLLKKKK commented May 2, 2026

AI Code Review - PR #962

Status: LGTM

Summary: P0/0 · P1/0 · P2/1 · P3/2

lgtm ready to ci

Non-blocking Suggestions

P2

  • cuda12_arm 依赖大幅扩展,需验证 aarch64 wheel 兼容性 @ deps/requirements_cuda12_arm.txt:1
    • 建议:在合入前在 cuda12_9_arm 平台上跑一遍 smoke + 基础 import 验证(至少 flash-attn / flash-attn-3 / deep-gemm / flashinfer-python),或在 PR description 中说明哪些包是有意 stub-resolve(运行时不会被 import 触发)。

P3

  • arch_select.bzl whl_deps() 默认 fallback 由 CPU torch 改为 CUDA torch @ arch_config/arch_select.bzl:55
    • 建议:可选:把 default 分支替换为 no_match_error 或保留显式 cuda12_9_x86/cuda12_9_arm 分支 + default = fail,让漏配的新平台在 bazel analysis 阶段直接报错而不是误装 cu129 包。
  • rtp-kernel 在 cuda12_arm 与 cuda12_9 lockfile 中存在构建时间戳漂移 @ deps/requirements_lock_cuda12_arm.txt:1
    • 建议:对齐两个 lockfile 中 rtp-kernel 的版本号(同一构建产物使用同一时间戳),或在 update_pip 流程里固定使用其中一个时间戳。

Checklist Violations (1 fail / 36 total)

RTP-LLM Checklist

  • [H] 测试与 CI — 测试覆盖充分:大重构等价覆盖,新功能端到端测试 → issue cuda12_arm 依赖大幅扩展,需验证 aarch64 wheel 兼容性
    cuda12_arm 依赖列表新增 flash-attn / flash-attn-3 / deep-ep / deep-gemm / nvidia-cutlass-dsl 等,PR diff 内未见 cuda12_9_arm 平台的 smoke / import 验证证据。

Strengths

  • 大型清理 PR,每处删除(cuda12_2/cuda12_6/cpu/arm config,requirements_cpu_arm.txt 等)都有对应的 .bazelrc / BUILD / WORKSPACE / arch_select.bzl 同步更新,没有留下悬空引用。
  • deps/pip.bzl 注释非常详尽,解释了 PIP_EXTRA_ARGS 索引选择策略以及 pip_ppu_torch stub 在内外源 --override_repository 体系下为什么必须保留。
  • WORKSPACE 新增 rtp_opensource_deps 并附注释,明确它与 rtp_deps 在内源 overlay 下的角色分离。
  • 同步更新了文档(install.md / debug.md / profiling.md / 3fs.md / benchmark.md)和 perf 测试脚本中的 cuda12_6 引用,避免示例与构建系统脱钩。
  • rtp_llm/BUILD 把 utils 隐式依赖的 aiohttp 显式化(rtp_llm/utils/util.py 已经 import aiohttp),消除了通过其他包传递的 fragile 依赖。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants