-
-
Notifications
You must be signed in to change notification settings - Fork 11.8k
[DO NOT MERGE] Introduce Renderer for processing chat messages (using ModelConfig)
#30200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[DO NOT MERGE] Introduce Renderer for processing chat messages (using ModelConfig)
#30200
Conversation
Signed-off-by: DarkLight1337 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a Renderer abstraction to process chat messages, which is a significant and well-executed refactoring. By moving tokenizer-specific and chat template logic into a new vllm.renderers module, the code becomes more modular and maintainable. The introduction of RendererLike protocol and RendererRegistry provides a clean interface and lazy registration mechanism. The changes are consistently applied across the codebase, including updates to entrypoints, tests, and input processing. This is a great step towards deprecating the old TokenizerRegistry and simplifying the chat message handling pipeline.
I found one minor issue related to a leftover ThreadPoolExecutor which I've commented on. Otherwise, the changes look solid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
vllm/vllm/tokenizers/registry.py
Lines 135 to 140 in f73bafb
| def cached_tokenizer_from_config(model_config: "ModelConfig", **kwargs): | |
| return cached_get_tokenizer( | |
| model_config.tokenizer, | |
| tokenizer_mode=model_config.tokenizer_mode, | |
| revision=model_config.tokenizer_revision, | |
| trust_remote_code=model_config.trust_remote_code, |
cached_tokenizer_from_config still forwards model_config.tokenizer plus a tokenizer_mode kwarg to cached_get_tokenizer, but get_tokenizer now expects (tokenizer_cls, tokenizer_name, …) and no longer accepts tokenizer_mode (registry.py lines 66‑74). Any code path that loads a tokenizer via this helper (e.g., model executor classes calling cached_tokenizer_from_config) will now fail with missing positional/unknown keyword errors instead of initializing a tokenizer, blocking those models entirely.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Hi @DarkLight1337, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, |
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
|
|
||
| for msgs in list_of_messages: | ||
| # NOTE: _parse_chat_message_content_parts() currently doesn't | ||
| # NOTE: renderer.render_messages() currently doesn't |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we ensure the mm_processor_kwargs can be passed and used by the render_messages ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been the case for the previous code so it is outside the scope of this refactoring PR which avoids adding new features.
jeremyteboul
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good to see some example, let met know if you need help with some input mm preproc rendering
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
|
Hi @DarkLight1337, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, |
Signed-off-by: DarkLight1337 <[email protected]>
|
The model preprocessing logic is much clearer now. |
Signed-off-by: DarkLight1337 <[email protected]>
|
Hi @DarkLight1337, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, |
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a Renderer abstraction for processing chat messages, which is a significant and well-executed architectural improvement. By centralizing prompt rendering logic, the entrypoint code in llm.py, async_llm.py, and the various serving_*.py files is substantially simplified and cleaned up. The introduction of RendererLike, RendererRegistry, and specific renderer implementations for different model types is a solid design. The tests have also been diligently updated to reflect these extensive changes. My review identified one minor issue where a safeguard was removed, potentially leading to a less clear error for users of a specific API, but overall, the refactoring is excellent.
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
|
|
||
| _T = TypeVar("_T", bound=type[TokenizerLike]) | ||
|
|
||
| _VLLM_TOKENIZERS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question: Can the TokenizerMode be automatically injected with _VLLM_TOKENIZERS here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think yes, but we are going to replace it with renderer mode later anyways
Signed-off-by: DarkLight1337 <[email protected]>
Purpose
vllm.renderers.RendererLike, to process chat messages into engine inputs.RENDERER_REGISTRYwhich lazily registers renderers to avoid circular import problem.vllm.renderers.init_tokenizer_from_configimplementation intocached_tokenizer_from_config.TokenizerRegistryto only use lazy imports, even for in-tree tokenizers.InputPreprocessor, replacing the tokenizer initialization insideLLMEngineandAsyncLLM.EngineClient.get_tokenizer()withEngineClient.renderer.get_tokenizer()to avoid unnecessary async.RequestPromptin online server with the prompt formats invllm.inputs.tests/renderersthat is run underAsync Engine, Inputs, Utils, Worker, Config Test (CPU).Towards #22880 and #23873
Future work:
CompletionRendererwith this implementation ofRenderer.apply_chat_templatefrom tokenizer into renderer.TokenizerRegistryin favor ofRENDERER_REGISTRYand renametokenizer_modetorenderer_mode.ModelConfig.Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.