diff --git a/README.md b/README.md index e10839eee..9c42800f8 100644 --- a/README.md +++ b/README.md @@ -102,6 +102,27 @@ conda install mpi4py pip install -e . ``` +#### Optional dependencies + +The base install above is enough for full / LoRA / LISA finetuning and HF-backend inference. Features below are gated behind extras — install only what you need: + +| Extra | Enables | Install | +| --------------- | --------------------------------------------- | -------------------------------------- | +| `vllm` | vLLM-backed inference and iterative DPO | `pip install -e ".[vllm]"` | +| `sglang` | SGLang-backed inference and iterative DPO | `pip install -e ".[sglang]"` | +| `trl` | DPO / iterative DPO training | `pip install -e ".[trl]"` | +| `deepspeed` | DeepSpeed integration | `pip install -e ".[deepspeed]"` | +| `flash_attn` | Flash Attention 2 | `pip install -e ".[flash_attn]"` | +| `ray` | Distributed reward-model inference | `pip install -e ".[ray]"` | +| `multimodal` | Multimodal models | `pip install -e ".[multimodal]"` | +| `gradio` | Gradio chatbot UI | `pip install -e ".[gradio]"` | +| `flask` | Flask deployment | `pip install -e ".[flask]"` | + +Multiple extras can be combined: `pip install -e ".[vllm,trl]"` for iterative DPO with vLLM, or `".[sglang,trl]"` for the SGLang variant. + +> [!IMPORTANT] +> vLLM and SGLang depend on incompatible CUDA / PyTorch versions and should not be installed into the same environment. If you need both, create separate conda envs (e.g. `lmflow-vllm` and `lmflow-sglang`). +
Looking for a previous version? ```bash diff --git a/docs/readme/README_es.md b/docs/readme/README_es.md index 7defd8422..7e7565bf1 100644 --- a/docs/readme/README_es.md +++ b/docs/readme/README_es.md @@ -47,18 +47,38 @@ Una caja de herramientas extensible, conveniente y eficiente para ajustar modelo ## Quick Start ### Setup -Nuestro repositorio ha sido probado en Linux (Ubuntu 20.04). Las otras plataformas de sistemas operativos (macOS, Windows) aún no han sido completamente probadas, por lo que pueden surgir algunos errores inesperados. Se recomienda probar primero en Linux/Windows WSL o utilizar Google Colab para experimentar. +Nuestro repositorio ha sido probado en Linux (Ubuntu 20.04). Las otras plataformas (macOS, Windows) aún no han sido completamente probadas, por lo que pueden surgir errores inesperados. Se recomienda probar primero en Linux / Windows WSL, o utilizar Google Colab. -Para CUDA 10.3-11.7, se recomienda utilizar `v0.0.5` o versiones anteriores. Para CUDA superior a 11.7, por favor, utilice nuestra rama estable `>= v0.0.6` para una mejor experiencia. ```bash -git clone https://github.com/OptimalScale/LMFlow.git +git clone -b v1.0.0 https://github.com/OptimalScale/LMFlow.git cd LMFlow conda create -n lmflow python=3.9 -y conda activate lmflow conda install mpi4py -bash install.sh +pip install -e . ``` +#### Dependencias opcionales + +La instalación base es suficiente para fine-tuning completo / LoRA / LISA y para inferencia con el backend de HuggingFace. Las siguientes funciones se habilitan mediante *extras* — instale solo lo que necesite: + +| Extra | Habilita | Comando | +| ------------ | ------------------------------------------------- | ---------------------------------- | +| `vllm` | Inferencia con vLLM e iterative DPO | `pip install -e ".[vllm]"` | +| `sglang` | Inferencia con SGLang e iterative DPO | `pip install -e ".[sglang]"` | +| `trl` | Entrenamiento DPO / iterative DPO | `pip install -e ".[trl]"` | +| `deepspeed` | Integración con DeepSpeed | `pip install -e ".[deepspeed]"` | +| `flash_attn` | Flash Attention 2 | `pip install -e ".[flash_attn]"` | +| `ray` | Inferencia distribuida del modelo de recompensa | `pip install -e ".[ray]"` | +| `multimodal` | Modelos multimodales | `pip install -e ".[multimodal]"` | +| `gradio` | Interfaz de chatbot con Gradio | `pip install -e ".[gradio]"` | +| `flask` | Despliegue con Flask | `pip install -e ".[flask]"` | + +Los extras se pueden combinar: `pip install -e ".[vllm,trl]"` para iterative DPO con vLLM, o `".[sglang,trl]"` para la variante con SGLang. + +> [!IMPORTANT] +> vLLM y SGLang dependen de versiones incompatibles de CUDA / PyTorch y no deberían instalarse en el mismo entorno. Si necesita ambos, cree entornos conda separados (por ejemplo, `lmflow-vllm` y `lmflow-sglang`). + ### Prepare Dataset Por favor, consulta nuestra [documentación oficial (en inglés)](https://optimalscale.github.io/LMFlow/examples/DATASETS.html). La documentación oficial se encuentra actualmente en proceso de traducción, te pedimos paciencia mientras tanto. diff --git a/docs/readme/README_hindi.md b/docs/readme/README_hindi.md index cf9049b55..2698f17f8 100644 --- a/docs/readme/README_hindi.md +++ b/docs/readme/README_hindi.md @@ -65,18 +65,38 @@ ## Quick Start ### Setup -हमारे रेपो को Linux (Ubuntu 20.04) पर परीक्षण किया गया है। अन्य ऑपरेटिंग सिस्टम प्लेटफॉर्म (MacOS, Windows) को पूरी तरह से परीक्षण नहीं किया गया है, इसलिए कुछ अपेक्षित त्रुटियों का सामना कर सकता है। Linux/Windows WSL पर प्रयोग करने या Google Colab का उपयोग करके अनुभव करने की सिफारिश की जाती है। +हमारे रेपो को Linux (Ubuntu 20.04) पर परीक्षण किया गया है। अन्य ऑपरेटिंग सिस्टम प्लेटफॉर्म (macOS, Windows) को पूरी तरह से परीक्षण नहीं किया गया है, इसलिए कुछ अप्रत्याशित त्रुटियों का सामना हो सकता है। पहले Linux या Windows WSL पर प्रयोग करने, या Google Colab का उपयोग करने की सिफारिश की जाती है। -CUDA 10.3-11.7 के लिए, `v0.0.5` या इससे पुराने संस्करणों का उपयोग करने की सिफारिश की जाती है। 11.7 से अधिक CUDA के लिए, बेहतर अनुभव के लिए हमारी स्थिर शाखा `>= v0.0.6` का उपयोग करें। ```bash -git clone https://github.com/OptimalScale/LMFlow.git +git clone -b v1.0.0 https://github.com/OptimalScale/LMFlow.git cd LMFlow conda create -n lmflow python=3.9 -y conda activate lmflow conda install mpi4py -bash install.sh +pip install -e . ``` +#### वैकल्पिक निर्भरताएँ (Optional dependencies) + +ऊपर दिया गया बेस इंस्टॉल पूर्ण / LoRA / LISA फ़ाइन-ट्यूनिंग और HuggingFace बैकएंड इंफरेंस के लिए पर्याप्त है। नीचे दी गई सुविधाएँ *extras* के माध्यम से सक्षम होती हैं — केवल वही इंस्टॉल करें जिसकी आपको आवश्यकता है: + +| Extra | क्या सक्षम होता है | इंस्टॉल कमांड | +| ------------ | ------------------------------------------------ | ------------------------------------ | +| `vllm` | vLLM-आधारित इंफरेंस और iterative DPO | `pip install -e ".[vllm]"` | +| `sglang` | SGLang-आधारित इंफरेंस और iterative DPO | `pip install -e ".[sglang]"` | +| `trl` | DPO / iterative DPO ट्रेनिंग | `pip install -e ".[trl]"` | +| `deepspeed` | DeepSpeed इंटीग्रेशन | `pip install -e ".[deepspeed]"` | +| `flash_attn` | Flash Attention 2 | `pip install -e ".[flash_attn]"` | +| `ray` | डिस्ट्रिब्यूटेड reward model इंफरेंस | `pip install -e ".[ray]"` | +| `multimodal` | मल्टीमॉडल मॉडल | `pip install -e ".[multimodal]"` | +| `gradio` | Gradio चैटबॉट UI | `pip install -e ".[gradio]"` | +| `flask` | Flask डिप्लॉयमेंट | `pip install -e ".[flask]"` | + +Extras को मिलाया जा सकता है: vLLM के साथ iterative DPO के लिए `pip install -e ".[vllm,trl]"`, या SGLang संस्करण के लिए `".[sglang,trl]"`। + +> [!IMPORTANT] +> vLLM और SGLang असंगत CUDA / PyTorch संस्करणों पर निर्भर हैं और इन्हें एक ही environment में इंस्टॉल नहीं करना चाहिए। यदि दोनों चाहिए, तो अलग-अलग conda environments बनाएँ (जैसे `lmflow-vllm` और `lmflow-sglang`)। + ### Prepare Dataset आप हमारी [आधिकारिक दस्तावेज़ीकरण (अंग्रेजी में)](https://optimalscale.github.io/LMFlow/examples/DATASETS.html) को देखें। आधिकारिक दस्तावेज़ीकरण अनुवाद के प्रक्रिया में है, कृपया धैर्य रखें। diff --git a/docs/readme/README_jp.md b/docs/readme/README_jp.md index eecd088a0..adee8211e 100644 --- a/docs/readme/README_jp.md +++ b/docs/readme/README_jp.md @@ -66,17 +66,38 @@ ## Quick Start ### Setup -私たちのリポジトリはすでにLinux(Ubuntu 20.04)で包括的なテストを完了しています。他のオペレーティングシステムプラットフォーム(MacOS、Windows)は完全にテストされていませんので、予期しないエラーが発生する可能性があります。まずLinux/Windows WSLで試してみるか、またはGoogle Colabをご利用ください。 -CUDA 10.3-11.7については、`v0.0.5`またはそれ以前のバージョンを使用することをお勧めします。11.7よりも新しいCUDAの場合は、より良い体験を得るために、安定したブランチ`>= v0.0.6`を使用してください。 +私たちのリポジトリは Linux(Ubuntu 20.04)でテスト済みです。他の OS プラットフォーム(macOS、Windows)は完全にはテストされていないため、予期しないエラーが発生する可能性があります。まず Linux または Windows WSL で試すか、Google Colab をご利用ください。 + ```bash -git clone https://github.com/OptimalScale/LMFlow.git +git clone -b v1.0.0 https://github.com/OptimalScale/LMFlow.git cd LMFlow conda create -n lmflow python=3.9 -y conda activate lmflow conda install mpi4py -bash install.sh +pip install -e . ``` +#### オプションの依存関係 (Optional dependencies) + +上記のベースインストールは、フル / LoRA / LISA ファインチューニングおよび HuggingFace バックエンドでの推論に十分です。以下の機能は *extras* で有効化されます — 必要なものだけインストールしてください: + +| Extra | 有効になる機能 | インストール | +| ------------ | --------------------------------------------- | ---------------------------------- | +| `vllm` | vLLM バックエンドの推論および iterative DPO | `pip install -e ".[vllm]"` | +| `sglang` | SGLang バックエンドの推論および iterative DPO | `pip install -e ".[sglang]"` | +| `trl` | DPO / iterative DPO トレーニング | `pip install -e ".[trl]"` | +| `deepspeed` | DeepSpeed 統合 | `pip install -e ".[deepspeed]"` | +| `flash_attn` | Flash Attention 2 | `pip install -e ".[flash_attn]"` | +| `ray` | 分散 reward model 推論 | `pip install -e ".[ray]"` | +| `multimodal` | マルチモーダルモデル | `pip install -e ".[multimodal]"` | +| `gradio` | Gradio チャットボット UI | `pip install -e ".[gradio]"` | +| `flask` | Flask デプロイ | `pip install -e ".[flask]"` | + +extras は組み合わせ可能です: vLLM での iterative DPO には `pip install -e ".[vllm,trl]"`、SGLang 版には `".[sglang,trl]"`。 + +> [!IMPORTANT] +> vLLM と SGLang は互換性のない CUDA / PyTorch バージョンに依存しており、同じ環境にインストールすべきではありません。両方が必要な場合は、別々の conda 環境を作成してください(例: `lmflow-vllm` と `lmflow-sglang`)。 + ### Prepare Dataset 当社の[公式ドキュメント(英語版)](https://optimalscale.github.io/LMFlow/examples/DATASETS.html)を参照してください。公式ドキュメントは現在翻訳中ですので、しばらくお待ちください。 diff --git a/docs/readme/README_ko.md b/docs/readme/README_ko.md index 0b365e431..341da9124 100644 --- a/docs/readme/README_ko.md +++ b/docs/readme/README_ko.md @@ -65,17 +65,38 @@ ## Quick Start ### Setup -저희의 Repo는 이미 리눅스 (우분투 20.04)에서 완전한 테스트가 이루어졌습니다. 다른 운영 체제 플랫폼 (맥OS, 윈도우)은 아직 완전히 테스트되지 않았으므로 예상치 못한 오류가 발생할 수 있습니다. 먼저 리눅스/윈도우 WSL에서 사용해보거나 Google Colab을 사용하는 것을 권장합니다. -CUDA 10.3-11.7에 대해서는 `v0.0.5` 및 그 이전 버전을 사용하는 것이 좋습니다. 11.7보다 큰 CUDA의 경우, 더 나은 경험을 위해 우리의 stable 브랜치인 `>= v0.0.6` 을 사용하십시오. +저희의 Repo는 리눅스 (우분투 20.04)에서 테스트되었습니다. 다른 운영 체제 플랫폼 (macOS, Windows)은 아직 완전히 테스트되지 않았으므로 예상치 못한 오류가 발생할 수 있습니다. 먼저 리눅스 또는 Windows WSL에서 사용해 보시거나 Google Colab을 사용하는 것을 권장합니다. + ```bash -git clone https://github.com/OptimalScale/LMFlow.git +git clone -b v1.0.0 https://github.com/OptimalScale/LMFlow.git cd LMFlow conda create -n lmflow python=3.9 -y conda activate lmflow conda install mpi4py -bash install.sh +pip install -e . ``` +#### 선택적 의존성 (Optional dependencies) + +위의 기본 설치는 전체 / LoRA / LISA 파인튜닝 및 HuggingFace 백엔드 추론에 충분합니다. 다음 기능들은 *extras* 를 통해 활성화됩니다 — 필요한 것만 설치하세요: + +| Extra | 활성화되는 기능 | 설치 명령어 | +| ------------ | ---------------------------------------------- | ---------------------------------- | +| `vllm` | vLLM 기반 추론 및 iterative DPO | `pip install -e ".[vllm]"` | +| `sglang` | SGLang 기반 추론 및 iterative DPO | `pip install -e ".[sglang]"` | +| `trl` | DPO / iterative DPO 학습 | `pip install -e ".[trl]"` | +| `deepspeed` | DeepSpeed 통합 | `pip install -e ".[deepspeed]"` | +| `flash_attn` | Flash Attention 2 | `pip install -e ".[flash_attn]"` | +| `ray` | 분산 reward model 추론 | `pip install -e ".[ray]"` | +| `multimodal` | 멀티모달 모델 | `pip install -e ".[multimodal]"` | +| `gradio` | Gradio 챗봇 UI | `pip install -e ".[gradio]"` | +| `flask` | Flask 배포 | `pip install -e ".[flask]"` | + +extras는 조합할 수 있습니다: vLLM 기반 iterative DPO는 `pip install -e ".[vllm,trl]"`, SGLang 버전은 `".[sglang,trl]"`. + +> [!IMPORTANT] +> vLLM과 SGLang은 호환되지 않는 CUDA / PyTorch 버전에 의존하므로 동일한 환경에 설치해서는 안 됩니다. 둘 다 필요한 경우 별도의 conda 환경을 만드세요 (예: `lmflow-vllm`, `lmflow-sglang`). + ### Prepare Dataset 저희의 [공식 문서(영문)](https://optimalscale.github.io/LMFlow/examples/DATASETS.html) 를 참고해 주세요. 공식 문서는 현재 번역 중이며, 조금만 기다려 주시기 바랍니다. diff --git a/docs/readme/README_zh-hans.md b/docs/readme/README_zh-hans.md index c450d19a4..d5300e3cf 100644 --- a/docs/readme/README_zh-hans.md +++ b/docs/readme/README_zh-hans.md @@ -60,18 +60,38 @@ ## 快速上手 ### 安装 -我们的Repo已经在Linux(Ubuntu 20.04)上进行了测试。其他操作系统平台(MacOS、Windows)尚未完全测试,因此可能会遇到一些预期外的错误。建议先在Linux/Windows WSL上尝试使用,或者使用Google Colab来体验。 +我们的 Repo 已经在 Linux(Ubuntu 20.04)上进行了测试。其他操作系统平台(macOS、Windows)尚未完全测试,可能会遇到一些预期外的错误。建议先在 Linux 或 Windows WSL 上尝试使用,或者使用 Google Colab 来体验。 -对于CUDA 10.3-11.7,建议使用`v0.0.5`及更早版本。对于大于11.7的CUDA,请使用我们的稳定分支`>= v0.0.6`以获得更好的体验。 ```bash -git clone https://github.com/OptimalScale/LMFlow.git +git clone -b v1.0.0 https://github.com/OptimalScale/LMFlow.git cd LMFlow conda create -n lmflow python=3.9 -y conda activate lmflow conda install mpi4py -bash install.sh +pip install -e . ``` +#### 可选依赖 + +基础安装已足够支持全参数 / LoRA / LISA 微调以及 HuggingFace 后端推理。以下功能通过 extras 按需开启 —— 只装你需要的: + +| Extra | 启用功能 | 安装命令 | +| ------------ | ----------------------------------------- | --------------------------------- | +| `vllm` | vLLM 后端推理与 iterative DPO | `pip install -e ".[vllm]"` | +| `sglang` | SGLang 后端推理与 iterative DPO | `pip install -e ".[sglang]"` | +| `trl` | DPO / iterative DPO 训练 | `pip install -e ".[trl]"` | +| `deepspeed` | DeepSpeed 集成 | `pip install -e ".[deepspeed]"` | +| `flash_attn` | Flash Attention 2 | `pip install -e ".[flash_attn]"` | +| `ray` | 分布式 reward model 推理 | `pip install -e ".[ray]"` | +| `multimodal` | 多模态模型 | `pip install -e ".[multimodal]"` | +| `gradio` | Gradio 聊天界面 | `pip install -e ".[gradio]"` | +| `flask` | Flask 部署 | `pip install -e ".[flask]"` | + +extras 可以组合使用:`pip install -e ".[vllm,trl]"` 用于 vLLM 后端的 iterative DPO,`".[sglang,trl]"` 则使用 SGLang。 + +> [!IMPORTANT] +> vLLM 与 SGLang 依赖互不兼容的 CUDA / PyTorch 版本,不应安装到同一个环境。如果两者都要用,请创建独立的 conda 环境(例如 `lmflow-vllm` 和 `lmflow-sglang`)。 + ### 准备数据集 请参考我们的 [官方文档(英文版)](https://optimalscale.github.io/LMFlow/examples/DATASETS.html)。官方文档正在汉化中,请耐心等待。 diff --git a/setup.py b/setup.py index 2e5842a9a..ec990542f 100644 --- a/setup.py +++ b/setup.py @@ -18,12 +18,15 @@ extra_require = { "multimodal": ["Pillow"], "vllm": ["vllm>=0.8.0"], - "sglang": ["sglang"], + # pybase64 is imported eagerly by sglang.utils but not declared as a hard + # dep upstream; without it `import sglang` raises ModuleNotFoundError. + "sglang": ["sglang", "pybase64"], "ray": ["ray>=2.22.0"], "gradio": ["gradio"], "flask": ["flask", "flask_cors"], "flash_attn": ["flash-attn>=2.0.2"], - "trl": ["trl==0.8.0"], + # rich is lazy-imported by trl's DPOTrainer; not declared in trl 0.11.x. + "trl": ["trl>=0.11,<0.12", "rich"], "deepspeed": ["deepspeed>=0.14.4"], "develop": ["pytest"], "dev": ["ruff", "pytest", "pre-commit"], diff --git a/src/lmflow/pipeline/auto_pipeline.py b/src/lmflow/pipeline/auto_pipeline.py index a7ab2dd3e..fb4ccb551 100644 --- a/src/lmflow/pipeline/auto_pipeline.py +++ b/src/lmflow/pipeline/auto_pipeline.py @@ -6,7 +6,7 @@ from lmflow.pipeline.inferencer import Inferencer from lmflow.pipeline.rm_inferencer import RewardModelInferencer from lmflow.pipeline.rm_tuner import RewardModelTuner -from lmflow.utils.versioning import is_package_version_at_least, is_ray_available, is_sglang_available, is_trl_available, is_vllm_available +from lmflow.utils.versioning import is_package_version_at_least, is_sglang_available, is_trl_available, is_vllm_available PIPELINE_MAPPING = { "evaluator": Evaluator, @@ -46,7 +46,7 @@ else: PIPELINE_NEEDS_EXTRAS.extend(["dpo_aligner", "dpov2_aligner"]) -if is_vllm_available() and is_trl_available() and is_ray_available(): +if is_trl_available() and (is_vllm_available() or is_sglang_available()): from lmflow.pipeline.iterative_dpo_aligner import IterativeDPOAligner PIPELINE_MAPPING["iterative_dpo_aligner"] = IterativeDPOAligner diff --git a/src/lmflow/pipeline/iterative_dpo_aligner.py b/src/lmflow/pipeline/iterative_dpo_aligner.py index 6e384736f..fee0a2d27 100644 --- a/src/lmflow/pipeline/iterative_dpo_aligner.py +++ b/src/lmflow/pipeline/iterative_dpo_aligner.py @@ -16,10 +16,12 @@ from lmflow.datasets.dataset import Dataset from lmflow.models.hf_decoder_model import HFDecoderModel from lmflow.models.hf_text_regression_model import HFTextRegressionModel +from lmflow.pipeline.base_pipeline import BasePipeline from lmflow.pipeline.dpov2_aligner import MemorySafeDPOv2Aligner from lmflow.pipeline.rm_inferencer import RewardModelInferencer -from lmflow.pipeline.vllm_inferencer import MemorySafeVLLMInferencer from lmflow.utils.common import print_banner +from lmflow.utils.protocol import DataProto +from lmflow.utils.versioning import is_sglang_available, is_vllm_available logger = logging.getLogger(__name__) @@ -121,28 +123,65 @@ def _do_target_model_inference( dataset: Dataset, output_dir: str, ): - result_cache_path = str(Path(output_dir) / "cache" / "target_model_inference_result.json") - inferencer = MemorySafeVLLMInferencer( + inferencer_args = self._parse_target_model_inference_args(args=self.aligner_args) + inferencer = self._build_response_generator( model_args=model.model_args, data_args=dataset.data_args, - inferencer_args=self._parse_target_model_inference_args( - args=self.aligner_args, - result_cache_path=result_cache_path, - ), + inferencer_args=inferencer_args, ) - res = inferencer.inference() - - dataset_out = {"type": "text_to_textlist", "instances": res} + res = inferencer.inference(model=model, dataset=dataset, release_gpu=True) + instances = self._dataproto_to_text_to_textlist_instances(res) target_model_inference_result_dir = Path(output_dir) / "target_model_inference_result" target_model_inference_result_dir.mkdir(parents=True, exist_ok=True) json.dump( - dataset_out, + {"type": "text_to_textlist", "instances": instances}, open(str(target_model_inference_result_dir / "result.json"), "w", encoding="utf-8"), ensure_ascii=False, indent=4, ) + @staticmethod + def _build_response_generator( + model_args: ModelArguments, + data_args: DatasetArguments, + inferencer_args: InferencerArguments, + ) -> BasePipeline: + engine = inferencer_args.inference_engine + if engine == "vllm": + if not is_vllm_available(): + raise ImportError('vllm is not installed. Install via `pip install -e ".[vllm]"`.') + from lmflow.pipeline.vllm_inferencer import VLLMInferencer + + return VLLMInferencer(model_args, data_args, inferencer_args) + if engine == "sglang": + if not is_sglang_available(): + raise ImportError('sglang is not installed. Install via `pip install -e ".[sglang]"`.') + from lmflow.pipeline.sglang_inferencer import SGLangInferencer + + return SGLangInferencer(model_args, data_args, inferencer_args) + raise ValueError( + f"iterative_dpo_aligner: unsupported inference_engine={engine!r}. Use 'vllm' or 'sglang'." + ) + + @staticmethod + def _dataproto_to_text_to_textlist_instances(res: DataProto) -> list[dict]: + # VLLMInferencer flattens n samples by repeat-interleaving inputs (see + # HFDecoderModel.prepare_inputs_for_inference); each block of + # `actual_n_rollouts` consecutive rows shares the same prompt. Group + # them back into one instance per prompt. + n_rollouts = res.meta_info["actual_n_rollouts"] + inputs_flat = res.non_tensor_batch["inputs"].tolist() + outputs_flat = res.non_tensor_batch["outputs"].tolist() + if len(inputs_flat) % n_rollouts != 0: + raise ValueError( + f"Inference result length {len(inputs_flat)} is not a multiple of n_rollouts={n_rollouts}" + ) + return [ + {"input": inputs_flat[i], "output": outputs_flat[i : i + n_rollouts]} + for i in range(0, len(inputs_flat), n_rollouts) + ] + def _do_reward_model_inference( self, model: HFTextRegressionModel, @@ -191,16 +230,11 @@ def _do_single_dpo_align( def _parse_target_model_inference_args( self, args: IterativeDPOAlignerArguments, - result_cache_path: str, ) -> InferencerArguments: - inferencer_args = self.__filter_args( + return self.__filter_args( mixed_args=args, target_cls=InferencerArguments, ) - inferencer_args.save_results = True - inferencer_args.results_path = result_cache_path - - return inferencer_args def _parse_reward_model_inference_args( self, diff --git a/src/lmflow/pipeline/vllm_inferencer.py b/src/lmflow/pipeline/vllm_inferencer.py index 15d80f8fd..835855553 100644 --- a/src/lmflow/pipeline/vllm_inferencer.py +++ b/src/lmflow/pipeline/vllm_inferencer.py @@ -7,6 +7,7 @@ os.environ["VLLM_WORKER_MULTIPROC_METHOD"] = "spawn" import subprocess import sys +import warnings from typing import Optional from transformers import AutoTokenizer @@ -121,8 +122,14 @@ def load_inference_results( class MemorySafeVLLMInferencer(VLLMInferencer): """Run VLLM inference in a subprocess for memory safety. - This is a workaround since vllm cannot release GPU memory properly - in-process. See: https://github.com/vllm-project/vllm/issues/1908 + .. deprecated:: + Scheduled for removal in lmflow 1.1.0. Use :class:`VLLMInferencer` + with ``release_gpu=True`` for the common single-GPU case, or wait + for the sleep-mode-based replacement that will land alongside the + vllm>=0.11 pin. This subprocess wrapper was a workaround for vllm's + inability to release GPU memory in-process + (https://github.com/vllm-project/vllm/issues/1908); the in-process + path is now reliable for most use cases. """ def __init__( @@ -131,6 +138,12 @@ def __init__( data_args: DatasetArguments, inferencer_args: InferencerArguments, ): + warnings.warn( + "MemorySafeVLLMInferencer is deprecated and will be removed in lmflow 1.1.0. " + "Use VLLMInferencer with release_gpu=True instead.", + DeprecationWarning, + stacklevel=2, + ) assert inferencer_args.save_inference_results or inferencer_args.save_results, ( "For MemorySafeVLLMInferencer, `save_inference_results` must be True." )