Skip to content

Add Cosmos3-Nano LIBERO-10 action-policy SFT recipe, config, eval harness, and doc#61

Open
fwd4 wants to merge 9 commits into
NVIDIA:mainfrom
fwd4:haolia/libero-action-policy-sft
Open

Add Cosmos3-Nano LIBERO-10 action-policy SFT recipe, config, eval harness, and doc#61
fwd4 wants to merge 9 commits into
NVIDIA:mainfrom
fwd4:haolia/libero-action-policy-sft

Conversation

@fwd4

@fwd4 fwd4 commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

What

Adds the Cosmos3-Nano LIBERO-10 action-policy SFT surface, mirroring the existing DROID counterpart (action_policy_droid_nano + action_policy_droid_repro.toml + launch_sft_action_policy_droid.sh + doc).

Feature (net-new)

  • Experiment config action_policy_libero_nano (gen + action heads from the public Cosmos3-Nano GA base).
  • Dataset LIBEROLeRobotDataset + get_action_libero_sft_dataset — frame_wise_relative rot6d, quantile_rot, concat_view (third-person + wrist), 20 fps.
    • base_dataset tasks.parquet fallback for community LIBERO layouts.
    • Resample-on-decode-failure guard so a single undecodable packed-mp4 frame can't crash a multi-node run (matches i4 behavior).
  • Closed-loop eval harness with vectorized sim, plus a batched /predict_batch path and single-rank no_dist checkpoint load in the policy server.

Recipe + doc

  • Canonical examples/toml/sft_config/action_policy_libero_repro.toml + examples/launch_sft_action_policy_libero.sh: lr 5e-5, warmup 500, cycle 16000, global batch 2048 (HSDP 2x8).
  • docs/action_policy_libero_sft.md.

Notes

  • Scoped to LIBERO only; the broader action-dataloader/model changes are intentionally not included here.
  • Based on main.

🤖 Generated with Claude Code

fwd4 and others added 9 commits June 26, 2026 20:59
…ness, and doc

Mirrors the DROID action-policy counterpart (action_policy_droid_nano + repro
toml + launch + doc). Net-new LIBERO feature:
- experiment config: action_policy_libero_nano
- dataset: LIBEROLeRobotDataset + get_action_libero_sft_dataset (frame_wise_relative
  rot6d, quantile_rot, concat_view, 20fps); base_dataset tasks.parquet fallback for
  community LIBERO layouts; resample-on-decode-failure guard (matches i4 behavior)
- closed-loop eval harness (vectorized sim) + batched /predict_batch inference path
  + single-rank no_dist checkpoint load for the policy server
- canonical recipe action_policy_libero_repro.toml + launch_sft_action_policy_libero.sh
  (lr 5e-5, warmup 500, cycle 16000, global batch 2048; ~95% libero_10 500-ep eval)
- docs/action_policy_libero_sft.md

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Lean the toml/config/launch/doc comments (drop SR numbers and experimental
detail), and set the canonical recipe to HSDP 2x8 with grad_accum=1 (global
batch 2048) instead of single-node grad_accum=2.
- action_sft_dataset.py: rebuild as origin/main + libero-only (drop the speedup-era
  ShardedDROIDLeRobotDataset import that broke config load on a clean main).
- remove dataset_reply_action_server.py (GT-replay debug tool, not part of the recipe).
- drop DROID/LoRA references from libero docstrings/comments/doc/launch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant