Skip to content

feat(vlm): add Nemotron-Omni RADIO post-load patches#2311

Merged
HuiyingLi merged 2 commits into
NVIDIA-NeMo:mainfrom
yuekaizhang:n3-omni-fix
May 27, 2026
Merged

feat(vlm): add Nemotron-Omni RADIO post-load patches#2311
HuiyingLi merged 2 commits into
NVIDIA-NeMo:mainfrom
yuekaizhang:n3-omni-fix

Conversation

@yuekaizhang
Copy link
Copy Markdown
Contributor

This PR improves nemotron-3-omni:

  • enable_radio_vit_fused_attn(): route RADIO timm ViT attention through F.scaled_dot_product_attention so the (B, H, seq, seq) attention tensor (~5 GiB per block at RADIO-v2-H + dynamic-resolution patch counts) is not materialized.
  • apply_parameter_freezing(): new freeze_video_embedder knob (default False). patch_generator.video_embedder is only exercised on video inputs; on image-only training it sits in the optimizer without state (no grad → no lazy init), so dcp.load on resume raises a missing-key error. Independent of freeze_vision_tower so the image encoder can stay trainable while the video branch is frozen out.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 25, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

HuiyingLi
HuiyingLi previously approved these changes May 25, 2026
Copy link
Copy Markdown
Contributor

@HuiyingLi HuiyingLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thank you @yuekaizhang !

@HuiyingLi
Copy link
Copy Markdown
Contributor

/ok to test 29bae71

@HuiyingLi
Copy link
Copy Markdown
Contributor

Hi @yuekaizhang could you please fix the ci errors, thank you~

- enable_radio_vit_fused_attn(): route RADIO timm ViT attention through
  F.scaled_dot_product_attention so the (B, H, seq, seq) attention tensor
  (~5 GiB per block at RADIO-v2-H + dynamic-resolution patch counts) is
  not materialized. Mirrors the Megatron-Bridge path's
  vision_config.use_flash_attn=True. No-op on non-RADIO models; invoked
  unconditionally from apply_model_infrastructure().
- apply_parameter_freezing(): new freeze_video_embedder knob (default
  False). patch_generator.video_embedder is only exercised on video
  inputs; on image-only training it sits in the optimizer without state
  (no grad → no lazy init), so dcp.load on resume raises a missing-key
  error. Independent of freeze_vision_tower so the image encoder can
  stay trainable while the video branch is frozen out.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: root <zhangyuekai@foxmail.com>
@yuekaizhang
Copy link
Copy Markdown
Contributor Author

Hi @yuekaizhang could you please fix the ci errors, thank you~

Done. Thanks!

@HuiyingLi
Copy link
Copy Markdown
Contributor

Thank you @yuekaizhang , I think the codecov is still failing. Would you mind fixing that? Appreciate it!

- enable_radio_vit_fused_attn: flips fused_attn on all blocks, resolves
  vision_model via both top-level and nested model.model paths, no-ops
  when RADIO is absent, and tolerates blocks that lack an attn attribute.
- apply_parameter_freezing(freeze_video_embedder=...): True freezes only
  patch_generator.video_embedder.* and leaves the rest of patch_generator
  trainable; False (default) keeps it trainable.

Pushes codecov/patch on these additions over the 80% threshold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: root <zhangyuekai@foxmail.com>
@yuekaizhang
Copy link
Copy Markdown
Contributor Author

/ok to test 9bd5d2c

@yuekaizhang
Copy link
Copy Markdown
Contributor Author

Thank you @yuekaizhang , I think the codecov is still failing. Would you mind fixing that? Appreciate it!

@HuiyingLi Fixed it now.

@HuiyingLi HuiyingLi enabled auto-merge (squash) May 27, 2026 00:58
Copy link
Copy Markdown
Contributor

@HuiyingLi HuiyingLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@HuiyingLi HuiyingLi merged commit 2610809 into NVIDIA-NeMo:main May 27, 2026
77 checks passed
yuekaizhang added a commit to yuekaizhang/RL that referenced this pull request May 27, 2026
Pulls in "feat(vlm): add Nemotron-Omni RADIO post-load patches" (NVIDIA-NeMo/Automodel#2311).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants