Skip to content

Conversation

@DarkLight1337
Copy link
Member

@DarkLight1337 DarkLight1337 commented Dec 7, 2025

Purpose

  • Prototype an interface, vllm.renderers.RendererLike, to process chat messages into engine inputs.
  • Introduce RENDERER_REGISTRY which lazily registers renderers to avoid circular import problem.
  • Move implementation-specific chat utils to the corresponding renderer in vllm.renderers.
  • Merge init_tokenizer_from_config implementation into cached_tokenizer_from_config.
  • Update TokenizerRegistry to only use lazy imports, even for in-tree tokenizers.
  • Initialize the renderer in InputPreprocessor, replacing the tokenizer initialization inside LLMEngine and AsyncLLM.
  • Replace EngineClient.get_tokenizer() with EngineClient.renderer.get_tokenizer() to avoid unnecessary async.
  • Replace RequestPrompt in online server with the prompt formats in vllm.inputs.
  • Update tests accordingly, and move some tests into a new directory tests/renderers that is run under Async Engine, Inputs, Utils, Worker, Config Test (CPU).

Towards #22880 and #23873

Future work:

  • Merge CompletionRenderer with this implementation of Renderer.
  • Move apply_chat_template from tokenizer into renderer.
  • Move microbatch tokenizer into renderer.
  • Since each renderer uses a specific tokenizer, we can deprecate TokenizerRegistry in favor of RENDERER_REGISTRY and rename tokenizer_mode to renderer_mode.
  • Split out renderer-specific fields from ModelConfig.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: DarkLight1337 <[email protected]>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Renderer abstraction to process chat messages, which is a significant and well-executed refactoring. By moving tokenizer-specific and chat template logic into a new vllm.renderers module, the code becomes more modular and maintainable. The introduction of RendererLike protocol and RendererRegistry provides a clean interface and lazy registration mechanism. The changes are consistently applied across the codebase, including updates to entrypoints, tests, and input processing. This is a great step towards deprecating the old TokenizerRegistry and simplifying the chat message handling pipeline.

I found one minor issue related to a leftover ThreadPoolExecutor which I've commented on. Otherwise, the changes look solid.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

def cached_tokenizer_from_config(model_config: "ModelConfig", **kwargs):
return cached_get_tokenizer(
model_config.tokenizer,
tokenizer_mode=model_config.tokenizer_mode,
revision=model_config.tokenizer_revision,
trust_remote_code=model_config.trust_remote_code,

P0 Badge cached_tokenizer_from_config uses outdated get_tokenizer signature

cached_tokenizer_from_config still forwards model_config.tokenizer plus a tokenizer_mode kwarg to cached_get_tokenizer, but get_tokenizer now expects (tokenizer_cls, tokenizer_name, …) and no longer accepts tokenizer_mode (registry.py lines 66‑74). Any code path that loads a tokenizer via this helper (e.g., model executor classes calling cached_tokenizer_from_config) will now fail with missing positional/unknown keyword errors instead of initializing a tokenizer, blocking those models entirely.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@mergify
Copy link

mergify bot commented Dec 7, 2025

Hi @DarkLight1337, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>

for msgs in list_of_messages:
# NOTE: _parse_chat_message_content_parts() currently doesn't
# NOTE: renderer.render_messages() currently doesn't
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we ensure the mm_processor_kwargs can be passed and used by the render_messages ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been the case for the previous code so it is outside the scope of this refactoring PR which avoids adding new features.

Copy link
Contributor

@jeremyteboul jeremyteboul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to see some example, let met know if you need help with some input mm preproc rendering

Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
@mergify
Copy link

mergify bot commented Dec 8, 2025

Hi @DarkLight1337, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Signed-off-by: DarkLight1337 <[email protected]>
@noooop
Copy link
Collaborator

noooop commented Dec 8, 2025

The model preprocessing logic is much clearer now.

Signed-off-by: DarkLight1337 <[email protected]>
@mergify
Copy link

mergify bot commented Dec 8, 2025

Hi @DarkLight1337, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

@mergify mergify bot added the ci/build label Dec 8, 2025
@DarkLight1337
Copy link
Member Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Renderer abstraction for processing chat messages, which is a significant and well-executed architectural improvement. By centralizing prompt rendering logic, the entrypoint code in llm.py, async_llm.py, and the various serving_*.py files is substantially simplified and cleaned up. The introduction of RendererLike, RendererRegistry, and specific renderer implementations for different model types is a solid design. The tests have also been diligently updated to reflect these extensive changes. My review identified one minor issue where a safeguard was removed, potentially leading to a less clear error for users of a specific API, but overall, the refactoring is excellent.


_T = TypeVar("_T", bound=type[TokenizerLike])

_VLLM_TOKENIZERS = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question: Can the TokenizerMode be automatically injected with _VLLM_TOKENIZERS here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think yes, but we are going to replace it with renderer mode later anyways

Signed-off-by: DarkLight1337 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build frontend gpt-oss Related to GPT-OSS models multi-modality Related to multi-modality (#4194) performance Performance-related issues ready ONLY add when PR is ready to merge/full CI is needed structured-output tool-calling v1

Projects

Status: No status
Status: No status
Status: To Triage

Development

Successfully merging this pull request may close these issues.

4 participants