Skip to content

Add curator-to-SFT JSONL converter#62

Open
lfengad wants to merge 1 commit into
mainfrom
add-curator-to-sft-jsonl
Open

Add curator-to-SFT JSONL converter#62
lfengad wants to merge 1 commit into
mainfrom
add-curator-to-sft-jsonl

Conversation

@lfengad

@lfengad lfengad commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Add cosmos_framework.scripts.curator_to_sft_jsonl, which converts cosmos-curator splitting-pipeline metas_jsonl output directly into the SFT training JSONL format, applying the same hard filters sft_dataset.py applies silently at train time (duration > 61.0s, per-window frames < 61, optional short-edge) so dataset counts match what training consumes. Emits a sidecar .summary.json with per-reason drop counts and rewrites vision_path relative to the JSONL so datasets stay portable across mounts.

Document the path as a new "Create Dataset from a Cosmos-Curator output directory" section in docs/dataset_jsonl.md.

Ported from imaginaire4 MR 9217: cosmos3.scripts -> cosmos_framework.scripts, OSS SPDX header, and stale sft_dataset.py line refs corrected to 548-550.

Verified: 24/24 tests pass, ruff check/format clean, CLI --help imports.
from MR 9217

Add cosmos_framework.scripts.curator_to_sft_jsonl, which converts
cosmos-curator splitting-pipeline metas_jsonl output directly into the
SFT training JSONL format, applying the same hard filters sft_dataset.py
applies silently at train time (duration > 61.0s, per-window frames < 61,
optional short-edge) so dataset counts match what training consumes. Emits
a sidecar <output>.summary.json with per-reason drop counts and rewrites
vision_path relative to the JSONL so datasets stay portable across mounts.

Document the path as a new "Create Dataset from a Cosmos-Curator output
directory" section in docs/dataset_jsonl.md.

Ported from imaginaire4 MR 9217: cosmos3.scripts -> cosmos_framework.scripts,
OSS SPDX header, and stale sft_dataset.py line refs corrected to 548-550.

Verified: 24/24 tests pass, ruff check/format clean, CLI --help imports.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lfengad lfengad changed the title Add curator-to-SFT JSONL converter (ported from i4 MR 9217) Add curator-to-SFT JSONL converter Jun 26, 2026
@lfengad lfengad closed this Jun 26, 2026
@lfengad lfengad reopened this Jun 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant