[DO NOT MERGE] Introduce Renderer for processing chat messages (using `ModelConfig`) #30200

DarkLight1337 · 2025-12-07T08:30:34Z

Purpose

Prototype an interface, vllm.renderers.RendererLike, to process chat messages into engine inputs.
Introduce RENDERER_REGISTRY which lazily registers renderers to avoid circular import problem.
Move implementation-specific chat utils to the corresponding renderer in vllm.renderers.
Merge init_tokenizer_from_config implementation into cached_tokenizer_from_config.
Update TokenizerRegistry to only use lazy imports, even for in-tree tokenizers.
Initialize the renderer in InputPreprocessor, replacing the tokenizer initialization inside LLMEngine and AsyncLLM.
Replace EngineClient.get_tokenizer() with EngineClient.renderer.get_tokenizer() to avoid unnecessary async.
Replace RequestPrompt in online server with the prompt formats in vllm.inputs.
Update tests accordingly, and move some tests into a new directory tests/renderers that is run under Async Engine, Inputs, Utils, Worker, Config Test (CPU).

Towards #22880 and #23873

Future work:

Merge CompletionRenderer with this implementation of Renderer.
Move apply_chat_template from tokenizer into renderer.
Move microbatch tokenizer into renderer.
Since each renderer uses a specific tokenizer, we can deprecate TokenizerRegistry in favor of RENDERER_REGISTRY and rename tokenizer_mode to renderer_mode.
Split out renderer-specific fields from ModelConfig.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: DarkLight1337 <[email protected]>

gemini-code-assist

Code Review

This pull request introduces a Renderer abstraction to process chat messages, which is a significant and well-executed refactoring. By moving tokenizer-specific and chat template logic into a new vllm.renderers module, the code becomes more modular and maintainable. The introduction of RendererLike protocol and RendererRegistry provides a clean interface and lazy registration mechanism. The changes are consistently applied across the codebase, including updates to entrypoints, tests, and input processing. This is a great step towards deprecating the old TokenizerRegistry and simplifying the chat message handling pipeline.

I found one minor issue related to a leftover ThreadPoolExecutor which I've commented on. Otherwise, the changes look solid.

vllm/entrypoints/pooling/score/serving.py

chatgpt-codex-connector

💡 Codex Review

vllm/vllm/tokenizers/registry.py

Lines 135 to 140 in f73bafb

    
           def cached_tokenizer_from_config(model_config: "ModelConfig", **kwargs): 
        
               return cached_get_tokenizer( 
        
                   model_config.tokenizer, 
        
                   tokenizer_mode=model_config.tokenizer_mode, 
        
                   revision=model_config.tokenizer_revision, 
        
                   trust_remote_code=model_config.trust_remote_code,

cached_tokenizer_from_config uses outdated get_tokenizer signature

cached_tokenizer_from_config still forwards model_config.tokenizer plus a tokenizer_mode kwarg to cached_get_tokenizer, but get_tokenizer now expects (tokenizer_cls, tokenizer_name, …) and no longer accepts tokenizer_mode (registry.py lines 66‑74). Any code path that loads a tokenizer via this helper (e.g., model executor classes calling cached_tokenizer_from_config) will now fail with missing positional/unknown keyword errors instead of initializing a tokenizer, blocking those models entirely.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm/renderers/hf.py

mergify · 2025-12-07T08:34:39Z

Hi @DarkLight1337, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Signed-off-by: DarkLight1337 <[email protected]>

jeremyteboul · 2025-12-08T05:51:51Z

vllm/entrypoints/llm.py


        for msgs in list_of_messages:
-            # NOTE: _parse_chat_message_content_parts() currently doesn't
+            # NOTE: renderer.render_messages() currently doesn't


can we ensure the mm_processor_kwargs can be passed and used by the render_messages ?

This has been the case for the previous code so it is outside the scope of this refactoring PR which avoids adding new features.

vllm/entrypoints/openai/serving_engine.py

jeremyteboul

good to see some example, let met know if you need help with some input mm preproc rendering

Signed-off-by: DarkLight1337 <[email protected]>

mergify · 2025-12-08T07:11:45Z

Hi @DarkLight1337, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Signed-off-by: DarkLight1337 <[email protected]>

noooop · 2025-12-08T09:16:18Z

The model preprocessing logic is much clearer now.

Signed-off-by: DarkLight1337 <[email protected]>

mergify · 2025-12-08T10:23:10Z

Hi @DarkLight1337, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 · 2025-12-08T17:38:33Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a Renderer abstraction for processing chat messages, which is a significant and well-executed architectural improvement. By centralizing prompt rendering logic, the entrypoint code in llm.py, async_llm.py, and the various serving_*.py files is substantially simplified and cleaned up. The introduction of RendererLike, RendererRegistry, and specific renderer implementations for different model types is a solid design. The tests have also been diligently updated to reflect these extensive changes. My review identified one minor issue where a safeguard was removed, potentially leading to a less clear error for users of a specific API, but overall, the refactoring is excellent.

Signed-off-by: DarkLight1337 <[email protected]>

chaunceyjiang · 2025-12-09T06:53:09Z

vllm/tokenizers/registry.py


-_T = TypeVar("_T", bound=type[TokenizerLike])

+_VLLM_TOKENIZERS = {


Just a question: Can the TokenizerMode be automatically injected with _VLLM_TOKENIZERS here?

I think yes, but we are going to replace it with renderer mode later anyways

Signed-off-by: DarkLight1337 <[email protected]>

[Renderer] Introduce Renderer

f73bafb

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 requested review from NickLucche, aarnphm, benchislett, chaunceyjiang, mgoin, noooop, patrickvonplaten, robertgshaw2-redhat, russellb and ywang96 as code owners December 7, 2025 08:30

mergify bot added frontend multi-modality Related to multi-modality (#4194) performance Performance-related issues gpt-oss Related to GPT-OSS models structured-output labels Dec 7, 2025

github-project-automation bot added this to gpt-oss Issues & Enhancements Dec 7, 2025

mergify bot added the v1 label Dec 7, 2025

github-project-automation bot added this to Structured Output Dec 7, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Dec 7, 2025

mergify bot added the tool-calling label Dec 7, 2025

github-project-automation bot added this to Tool Calling Dec 7, 2025

gemini-code-assist bot reviewed Dec 7, 2025

View reviewed changes

vllm/entrypoints/pooling/score/serving.py Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Dec 7, 2025

View reviewed changes

vllm/renderers/hf.py Show resolved Hide resolved

DarkLight1337 added 3 commits December 7, 2025 08:39

Simplify

b7222cb

Signed-off-by: DarkLight1337 <[email protected]>

Move simplify

4c81f01

Signed-off-by: DarkLight1337 <[email protected]>

Typo

4a37cda

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 mentioned this pull request Dec 7, 2025

[Renderer] Introduce Renderer for processing chat messages (using RendererConfig) #30198

Closed

5 tasks

Set the tokenizer name

3d11a96

Signed-off-by: DarkLight1337 <[email protected]>

jeremyteboul reviewed Dec 8, 2025

View reviewed changes

vllm/entrypoints/openai/serving_engine.py Outdated Show resolved Hide resolved

jeremyteboul reviewed Dec 8, 2025

View reviewed changes

DarkLight1337 added 9 commits December 8, 2025 06:09

Update

f7c1b2b

Signed-off-by: DarkLight1337 <[email protected]>

Remove RequestPrompt

ddd9590

Signed-off-by: DarkLight1337 <[email protected]>

Simplify

ed3fd86

Signed-off-by: DarkLight1337 <[email protected]>

Fix

10814a7

Signed-off-by: DarkLight1337 <[email protected]>

Comment

67e5f7b

Signed-off-by: DarkLight1337 <[email protected]>

Reword

88700ee

Signed-off-by: DarkLight1337 <[email protected]>

Unnecessary lazy import

e8787f7

Signed-off-by: DarkLight1337 <[email protected]>

Simplify

9dca5eb

Signed-off-by: DarkLight1337 <[email protected]>

Reduce diff

854ac77

Signed-off-by: DarkLight1337 <[email protected]>

Fix

06d1f87

Signed-off-by: DarkLight1337 <[email protected]>

Fix

461dc61

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 added 4 commits December 8, 2025 11:37

Fix mypy and test

0a86c22

Signed-off-by: DarkLight1337 <[email protected]>

Merge branch 'main' into init-renderer-model

35d36fe

Update tests

5991f7b

Signed-off-by: DarkLight1337 <[email protected]>

Update tests

38ad20b

Signed-off-by: DarkLight1337 <[email protected]>

mergify bot added the ci/build label Dec 8, 2025

Merge branch 'main' into init-renderer-model

dc6ed06

gemini-code-assist bot reviewed Dec 8, 2025

View reviewed changes

DarkLight1337 added 3 commits December 9, 2025 04:26

Move more tests

89cbe7e

Signed-off-by: DarkLight1337 <[email protected]>

Merge branch 'main' into init-renderer-model

07aefa1

Update misc

63d6961

Signed-off-by: DarkLight1337 <[email protected]>

chaunceyjiang reviewed Dec 9, 2025

View reviewed changes

Update

478b89e

Signed-off-by: DarkLight1337 <[email protected]>

	def cached_tokenizer_from_config(model_config: "ModelConfig", **kwargs):
	return cached_get_tokenizer(
	model_config.tokenizer,
	tokenizer_mode=model_config.tokenizer_mode,
	revision=model_config.tokenizer_revision,
	trust_remote_code=model_config.trust_remote_code,


		_T = TypeVar("_T", bound=type[TokenizerLike])

		_VLLM_TOKENIZERS = {

Uh oh!

[DO NOT MERGE] Introduce Renderer for processing chat messages (using ModelConfig) #30200

Are you sure you want to change the base?

[DO NOT MERGE] Introduce Renderer for processing chat messages (using ModelConfig) #30200

Conversation

DarkLight1337 commented Dec 7, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

mergify bot commented Dec 7, 2025

Uh oh!

jeremyteboul Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jeremyteboul left a comment

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Dec 8, 2025

Uh oh!

noooop commented Dec 8, 2025

Uh oh!

mergify bot commented Dec 8, 2025

Uh oh!

DarkLight1337 commented Dec 8, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

chaunceyjiang Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[DO NOT MERGE] Introduce Renderer for processing chat messages (using `ModelConfig`) #30200

[DO NOT MERGE] Introduce Renderer for processing chat messages (using `ModelConfig`) #30200

DarkLight1337 commented Dec 7, 2025 •

edited by github-actions bot

Loading