[Data] Apply DataProto to vLLM Inference & Align API with SGLang#967
Conversation
Code reviewFound 1 issue:
LMFlow/examples/vllm_inference.py Lines 40 to 45 in dee43cf LMFlow/src/lmflow/models/hf_model_mixin.py Lines 559 to 581 in dee43cf Review prepared by @Jingyuan-zhu. |
|
Updated the |
|
Re-checked at 7408430. The docstring/example mismatch I flagged is fully fixed: |
- iterative_dpo_aligner: dispatch in-process VLLMInferencer or SGLangInferencer based on inference_engine; fix DataProto -> text_to_textlist conversion left misaligned by #967 (n>1 rollouts are repeat-interleaved by prepare_inputs_for_inference and need to be ungrouped via meta_info["actual_n_rollouts"]). - vllm_inferencer: mark MemorySafeVLLMInferencer deprecated with DeprecationWarning; scheduled for removal in lmflow 1.1.0. - auto_pipeline: relax iterative_dpo_aligner gate from vllm AND trl AND ray to trl AND (vllm OR sglang); ray is only needed for the opt-in distributed reward inference path. - setup.py: bump trl 0.8.0 -> trl>=0.11,<0.12; add pybase64 to [sglang] and rich to [trl] to work around upstream packaging gaps (sglang.utils eagerly imports pybase64; trl 0.11.x lazy-imports rich). - README + 5 localized READMEs: document optional dependency extras and the vllm/sglang environment incompatibility.
Overview
DataPrototo vllm inference pipeline, aligning its API with the sglang inferencer introduced in Unified data exchange protocol across modules #960. This unifies data exchange across inference engines and modernizes the vllm integration.Detailed Description
DataProto integration
VLLMInferencernow returnsDataProtoinstead oflist[VLLMInferenceResultWithInput], with prompts innon_tensor_batch["inputs"]and generated text innon_tensor_batch["outputs"]prepare_inputs_for_inferencecreatesDataProtofor both sglang and vllm through a unified code path__vllm_inferenceinHFDecoderModelextracts prompts and sampling params fromDataProto, converts tovllm.SamplingParams, and stores outputs back into the protoDataProto.save_to_disk/load_from_diskinference_results_pathnow accepts a directory — results are automatically saved asinference_results.pklinside itAPI alignment with sglang and modernization
VLLMInferencernow mirrorsSGLangInferencerInferencerWithOffloadingbase class and all Ray-based distributed inference code -- vllm >= 0.8 supportsdata_parallel_sizenatively invllm.LLM(), using a multiprocessing backend with no Ray dependency--inference_data_parallel_sizeargumenttensor_parallel_size × data_parallel_sizeuse_beam_searchfrom sampling params (dropped in vLLM V1), added deprecation warningdeactivate_model_for_inference— old cleanup code referencedllm_engine.model_executor.driver_workerwhich no longer exists in V1--inference_max_model_lento cap context length (prompt and output) for models with large defaults>=0.4.3to>=0.8.0insetup.pyFiles changed
src/lmflow/pipeline/vllm_inferencer.pysrc/lmflow/models/hf_decoder_model.pysrc/lmflow/models/hf_model_mixin.pysrc/lmflow/args.pysrc/lmflow/pipeline/sglang_inferencer.pysrc/lmflow/pipeline/utils/memory_safe_vllm_inference.pyexamples/vllm_inference.pyscripts/run_vllm_inference.shscripts/run_sglang_inference.shsetup.pytests/pipeline/test_vllm_inferencer.pyDownstream impact
MemorySafeVLLMInferenceris updated to returnDataProto.iterative_dpo_aligner.pyconsumesMemorySafeVLLMInferencerand will need a separate update to handleDataProtoinstead oflist[VLLMInferenceResultWithInput].Tests
scripts/run_vllm_inference.shend-to-end with target model