GitHub - nlzy/vllm-gfx906: vLLM for AMD gfx906 GPUs, e.g. Radeon VII / MI50 / MI60

This repository was archived by the owner on Feb 20, 2026. It is now read-only.

Name		Name	Last commit message	Last commit date
Latest commit History 11,962 Commits
.buildkite		.buildkite
.gemini		.gemini
.github.orig		.github.orig
benchmarks		benchmarks
cmake		cmake
csrc		csrc
docker		docker
docs		docs
examples		examples
requirements		requirements
tests		tests
tools		tools
vllm		vllm
.clang-format		.clang-format
.coveragerc		.coveragerc
.dockerignore		.dockerignore
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitignore		.gitignore
.markdownlint.yaml		.markdownlint.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
.shellcheckrc		.shellcheckrc
.yapfignore		.yapfignore
CMakeLists.txt		CMakeLists.txt
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DCO		DCO
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README		README
README.md.orig		README.md.orig
RELEASE.md		RELEASE.md
SECURITY.md		SECURITY.md
codecov.yml		codecov.yml
mkdocs.yaml		mkdocs.yaml
pyproject.toml		pyproject.toml
setup.py		setup.py
use_existing_torch.py		use_existing_torch.py

Repository files navigation

vLLM for gfx906
===================

ARICHIVED
-------------------

Over the past year, almost all open-weight models have grown increasingly large,
far exceeding my capacity for development and testing. Coupled with the rising
price of MI50 GPUs, I am reluctant to keep adding more cards. I have now
archived this repository; everyone is welcome to fork it.

ORIGINAL README
-------------------

This is a modified version of vLLM, works with (and only works with) AMD gfx906
GPUs such as Radeon VII / Radeon Pro VII / Instinct MI50 / Instinct MI60.

This fork was (and still is) just a passion project shared for fun. I won't be
putting much effort into it. Use it at your own risk, especially please don't
use it as a reference for your GPU purchasing decisions.

RUN WITH DOCKER
-------------------

Please install ROCm 6.3 first, only kernel-mode driver is required. Refer to
the official documentation by AMD.

```
docker pull nalanzeyu/vllm-gfx906
docker run -it --rm --shm-size=2g --device=/dev/kfd --device=/dev/dri \
--group-add video -p 8000:8000 -v <YOUR_MODEL_PATH>:/model \
nalanzeyu/vllm-gfx906 vllm serve /model
```

SUPPORT QUANTIZATIONS
-------------------

See #29

GPTQ and AWQ are the first recommended quantization formats.

vLLM's llm-compressor with W4A16 INT format is also recommended. Other formats
in llm-compressor are not support.

All MoE quantization models are significantly slow, and all unquantized models
are slightly slow. Not recommended to use.

BUILD
-------------------

Please install ROCm 6.3 first. You need to install both kernel-mode driver and
ROCm packages. Refer to the official documentation by AMD.

You also need python-venv / python-dev, on Debian / Ubuntu use this command:
$ sudo apt install python3-venv python3-dev

You also need triton-gfx906 v3.5.0+gfx906 see:
https://github.com/nlzy/triton-gfx906/tree/v3.5.0+gfx906

```
cd vllm-gfx906

python3 -m venv vllmenv
source vllmenv/bin/activate

pip3 install torch==2.9 torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.3
pip3 install -r requirements/rocm-build.txt -r requirements/rocm.txt

pip3 install --no-build-isolation --no-deps -v .
```

CREDITS
-------------------

https://github.com/Said-Akbar/vllm-rocm