Enable CUDA Graphs with vLLM Data Parallel by ihebchaa · Pull Request #3020 · EleutherAI/lm-evaluation-harness

ihebchaa · 2025-05-27T06:40:37Z

Problem:

When using vLLM with data_parallel_size > 1, the current implementation forces enforce_eager=True, which disables CUDA graphs and significantly impacts performance. This is particularly problematic for Reasoning models that require large max_new_tokens (e.g., 32k+ tokens).

Solution:

this PR removes the forced enforce_eager=True when using data parallel.

Key Changes

Improved CUDA device isolation: Each worker process correctly sets its CUDA_VISIBLE_DEVICES by running in an isolated env.
Removed forced eager execution

Tests

Tested with tp=2 and dp (1, 2, 4) with a 7b reasoning model

CLAassistant · 2025-05-27T06:40:44Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

Iheb Chaabane seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

baberabb · 2025-05-27T11:04:15Z

Hi! thanks for the PR. Are you aware why they enforce eager if you follow their public API?

ihebchaa · 2025-05-27T11:18:06Z

It's not clear to me why it's forced here
Multiple vLLM instances are created separately on each dp rank and do not communicate with each other, so I don't see why enforce_eager is enforced.

update

19e7e13

ihebchaa requested review from StellaAthena and baberabb as code owners May 27, 2025 06:40

younesbelkada mentioned this pull request May 28, 2025

Fix: fix vllm issue with DP>1 #3025

Merged

slimfrkha mentioned this pull request May 30, 2025

Optimize Evaluation Workflow for Better Batching and Model Reuse For benchmarks with n_repeat > 1 mlfoundations/evalchemy#125

Open

slimfrkha mentioned this pull request Sep 17, 2025

Ignore seed when splitting batch in chunks with groupby #3047

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable CUDA Graphs with vLLM Data Parallel#3020

Enable CUDA Graphs with vLLM Data Parallel#3020
ihebchaa wants to merge 1 commit intoEleutherAI:mainfrom
ihebchaa:fix/vllm-dp-non-eager

ihebchaa commented May 27, 2025

Uh oh!

CLAassistant commented May 27, 2025

Uh oh!

baberabb commented May 27, 2025

Uh oh!

ihebchaa commented May 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ihebchaa commented May 27, 2025

Problem:

Solution:

Key Changes

Tests

Uh oh!

CLAassistant commented May 27, 2025

Uh oh!

baberabb commented May 27, 2025

Uh oh!

ihebchaa commented May 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants