Skip to content

release: sync from i4 8ab710e8#54

Open
yy-code-nv wants to merge 5 commits into
mainfrom
release/2026-06-23-8ab710e8
Open

release: sync from i4 8ab710e8#54
yy-code-nv wants to merge 5 commits into
mainfrom
release/2026-06-23-8ab710e8

Conversation

@yy-code-nv

Copy link
Copy Markdown
Collaborator

Sync cosmos-framework with imaginaire4 commit 8ab710e8.

Highlights:

  • Promote the new sequence_packing package (modalities, runtime, natten, packers, types, mrope, temporal_causal) under cosmos_framework/data/vfm/sequence_packing/, replacing the previous monolithic sequence_packing.py and the unified_3dmrope_utils helper (both removed).
  • Update consumers (mot/, callbacks/, data/vfm/, omni_mot_model.py) to the new sequence_packing API.
  • Update sft_dataset.py to import add_special_tokens via the new deep path (cosmos_framework.data.vfm.sequence_packing.modalities) since it is no longer re-exported from the package root.

@yy-code-nv yy-code-nv marked this pull request as ready for review June 25, 2026 09:52
@yy-code-nv yy-code-nv force-pushed the release/2026-06-23-8ab710e8 branch from 2f04b33 to b9f1224 Compare June 25, 2026 09:53
yy-code-nv and others added 5 commits June 25, 2026 20:11
Sync cosmos-framework with imaginaire4 commit 8ab710e8.

Highlights:
- Promote the new sequence_packing package (modalities, runtime, natten,
  packers, types, mrope, temporal_causal) under
  cosmos_framework/data/vfm/sequence_packing/, replacing the previous
  monolithic sequence_packing.py and the unified_3dmrope_utils helper
  (both removed).
- Update consumers (mot/, callbacks/, data/vfm/, omni_mot_model.py) to
  the new sequence_packing API.
- Update sft_dataset.py to import add_special_tokens via the new deep
  path (cosmos_framework.data.vfm.sequence_packing.modalities) since it
  is no longer re-exported from the package root.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…edMonitor use

- callbacks/dit_image_sample.py: shipped (added to mapping include_files).
  Fixes ModuleNotFoundError in cosmos_framework.callbacks.dit_image_sample.
- configs/base/defaults/callbacks.py: orphaned dataloader_speed=L(
  DetailedDataLoadingSpeedMonitor)(...) entry stripped. The i4 source
  IGNORE blocks now wrap both the import and both usage sites.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ameterize NPROC

- model/vfm/tokenizers/stable_diffusion_vae_8x8.py: shipped (fixes
  ModuleNotFoundError on cosmos_framework.model.vfm.tokenizers.
  stable_diffusion_vae_8x8 imported by configs/base/defaults/tokenizer.py).
- configs/base/experiment/sft/models/{nano,super}_model_config.py: drop
  5 DiT-only fields (high_sigma_ratio, high_sigma_timesteps_{min,max},
  use_high_sigma_strategy{,_action}) that target DiTRectifiedFlowTrainingConfig
  (in the unshipped llm/dit subtree) but were being composed onto
  RectifiedFlowTrainingConfig — OmegaConf rejected the unknown keys.
- tests/nano_training_smoke_test.py: parameterize on TEST_MAX_GPUS so the
  smoke can run on 4-GPU allocations. Skip threshold, @pytest.mark.gpus,
  NPROC_PER_NODE, and the post-train torchrun all use MAX_GPUS instead of
  the hardcoded 8.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rization

- cosmos_framework/model/vfm/mot/context_parallel_test.py: bring back
  under release authority. The IGNORE-stripped version drops the
  test_replicated_attention_io_cp_matches_single_rank_attention function
  and the two interactive.* imports it depends on (unreleased subtree).
  Other two tests + helpers ship cleanly.
- tests/nano_training_smoke_test.py: revert TEST_MAX_GPUS parameterization
  back to hardcoded 8. CI server has 8 cards available so the 4-GPU
  workaround isn't needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The 2026-06-09 GB200 goldens drifted ~5e-3 in loss after the rectified-flow
sigma-sampling refactor (the ``t = 1 - t_raw`` flip moved into the sampler
via per-sample ``shifts``). The shift is intentional; recapture against
the current code rather than restoring the legacy form.

Captured 2026-06-25 on a 4 × NVIDIA GB200 node with seed 42 under
``--deterministic``. Loss reproduces bit-exact across all 10 iters;
grad_norm is also deterministic here because compile.enabled=false in
nano_model_config -- those values are pinned, flip to None if a future
change re-enables compile and the all-rank reduction becomes non-det.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@yy-code-nv yy-code-nv force-pushed the release/2026-06-23-8ab710e8 branch from b9f1224 to 5a04d50 Compare June 25, 2026 12:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants