Skip to content

feat: Make vLLM reasoning tags configurable from YAML#1402

Open
hokuyama0106 wants to merge 5 commits into
NVIDIA-NeMo:mainfrom
hokuyama0106:update-reasoning-tags-config
Open

feat: Make vLLM reasoning tags configurable from YAML#1402
hokuyama0106 wants to merge 5 commits into
NVIDIA-NeMo:mainfrom
hokuyama0106:update-reasoning-tags-config

Conversation

@hokuyama0106
Copy link
Copy Markdown

@hokuyama0106 hokuyama0106 commented May 24, 2026

This updates responses_api_models/vllm_model/app.py so reasoning wrappers/parsers no longer assume <think>...</think>. The tag pair is now configurable via model YAML while preserving the current <think> defaults for existing configs.

Copilot AI and others added 4 commits May 24, 2026 13:40
Agent-Logs-Url: https://github.com/hokuyama0106/Gym/sessions/4fd844ac-7f43-4ab8-92cd-2ee453b53c9d

Co-authored-by: hokuyama0106 <50006381+hokuyama0106@users.noreply.github.com>
Agent-Logs-Url: https://github.com/hokuyama0106/Gym/sessions/4fd844ac-7f43-4ab8-92cd-2ee453b53c9d

Co-authored-by: hokuyama0106 <50006381+hokuyama0106@users.noreply.github.com>
Agent-Logs-Url: https://github.com/hokuyama0106/Gym/sessions/4fd844ac-7f43-4ab8-92cd-2ee453b53c9d

Co-authored-by: hokuyama0106 <50006381+hokuyama0106@users.noreply.github.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 24, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Agent-Logs-Url: https://github.com/hokuyama0106/Gym/sessions/67b6eeed-667b-4547-a561-f280274265c0

Co-authored-by: hokuyama0106 <50006381+hokuyama0106@users.noreply.github.com>
@hokuyama0106 hokuyama0106 changed the title Make vLLM reasoning tags configurable from YAML feat: Make vLLM reasoning tags configurable from YAML May 24, 2026
@hokuyama0106 hokuyama0106 marked this pull request as ready for review May 24, 2026 14:04
@cmunley1
Copy link
Copy Markdown
Contributor

Did you test this on older vllm? I think these were only introduced in vllm 0.19

Also, have you tried thinking token budget with these changes?

https://docs.vllm.ai/en/latest/features/reasoning_outputs/#thinking-budget-control

@hokuyama0106
Copy link
Copy Markdown
Author

hokuyama0106 commented May 25, 2026

Hello, thank you for your reply.

Did you test this on older vllm? I think these were only introduced in vllm 0.19

I am using trl-nemo gym integration and the vllm server is not from the pure vllm liblrary.
The changes of this PR are just parsing the reasoning tags from the server outputs so I think it doesn't conflict any other features.
If you have any points that I should check, please give me those.

https://github.com/huggingface/trl/blob/v0.28.0/trl/scripts/vllm_serve.py
https://huggingface.co/docs/trl/nemo_gym

Also, have you tried thinking token budget with these changes?

No, we haven't, but I can implement it in this PR if you want.

@cmunley1
Copy link
Copy Markdown
Contributor

The integration with TRL is currently unstable, I would recommend using Nemo RL or verl, sorry for the redirection.

@hokuyama0106
Copy link
Copy Markdown
Author

hokuyama0106 commented May 26, 2026

Ok, I will try it.
Should we temporarily keep the current code and close this PR even if the think tags are hardcoded and there is a TODO comment?

https://github.com/NVIDIA-NeMo/Gym/blob/main/responses_api_models/vllm_model/app.py#L626-L630
https://github.com/NVIDIA-NeMo/Gym/blob/main/responses_api_models/vllm_model/app.py#L916-L917

@cmunley1
Copy link
Copy Markdown
Contributor

Do you have a use case where there are different think tags?

I do think it would be good to update for compatibility with vllm 0.19, just need to be careful as this would affect all environments. https://docs.vllm.ai/en/latest/features/reasoning_outputs/#thinking-budget-control

cc @bxyu-nvidia

@hokuyama0106
Copy link
Copy Markdown
Author

hokuyama0106 commented May 26, 2026

Do you have a use case where there are different think tags?

The current vllm server from the trl integration only outputs the raw output including reasoning tags.
We are training models having different reasoning tags from <think> with trl so we use the modification as this PR.

I do think it would be good to update for compatibility with vllm 0.19, just need to be careful as this would affect all environments.

I got it. I fully agree with your suggestion.
It does make more sense to implement the thinking-budget-control in this repo and modify the post-processing of the trl vllm server if we want to use trl.
In that case, I will close this PR and try to update a function with the thinking-budget-control.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants