Skip to content

Add inference regression test for Reasoner mode.#59

Merged
lfengad merged 6 commits into
mainfrom
maoshengl/reasoner-inference-test
Jun 25, 2026
Merged

Add inference regression test for Reasoner mode.#59
lfengad merged 6 commits into
mainfrom
maoshengl/reasoner-inference-test

Conversation

@foreverlms

Copy link
Copy Markdown
Collaborator

No description provided.

foreverlms and others added 3 commits June 25, 2026 02:15
…token logits goldens

Adds tests/nano_reasoner_inference_smoke_test.py (mirrors nano_inference_smoke_test
style) + tests/_reasoner_logits_probe.py. The probe pins deterministic kernels
(greedy decode, CUBLAS_WORKSPACE_CONFIG, FLASH_ATTENTION_DETERMINISTIC=1,
cudnn.deterministic) and monkey-patches unified_mot._sample_next_token to dump
the first decoded token's logits on rank 0.

Two cases, each compared against its own committed golden (exact argmax +
allclose rtol/atol=1e-3): text-only (reasoner.json) and image-conditioned
(reasoner_image.json). Run-to-run logits are bit-identical on a fixed 4-GPU
config (max|Δ|=0.0).

Goldens:
  tests/data/nano_reasoner_inference_smoke_test/first_token_logits_golden.pt
  tests/data/nano_reasoner_inference_smoke_test/first_token_logits_image_golden.pt

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… goldens)

New GPU job runs tests/nano_reasoner_inference_smoke_test.py with TEST_MAX_GPUS=4
(--num-gpus=4 --levels=2), modeled on inference-smoke; caches the image input's
remote vision_path via COSMOS_DOWNLOAD_CACHE_DIR. Header comment updated (four -> five GPU jobs).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Target tests/nano_reasoner_inference_smoke_test.py::test_nano_reasoner_image_first_token_logits
instead of the whole file; the text-only variant stays in the file but is not
exercised in CI.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
lfengad
lfengad previously approved these changes Jun 25, 2026
Disambiguates the llava_ov training-loss regression job from the new
reasoner-inference-smoke job.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…sion

Mirror the reasoner-training-regression rename; consistent *-training-regression
naming for the loss-vs-goldens training jobs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@foreverlms foreverlms requested a review from yy-code-nv June 25, 2026 09:40
Consistent generator-* naming, paralleling reasoner-inference-smoke.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@foreverlms foreverlms force-pushed the maoshengl/reasoner-inference-test branch from ac07cd5 to 4ea2ebc Compare June 25, 2026 09:56
@foreverlms foreverlms enabled auto-merge (squash) June 25, 2026 11:33
@foreverlms foreverlms disabled auto-merge June 25, 2026 11:43
@lfengad lfengad merged commit e6c385b into main Jun 25, 2026
15 of 16 checks passed
@lfengad lfengad deleted the maoshengl/reasoner-inference-test branch June 25, 2026 11:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants