Skip to content
This repository was archived by the owner on Feb 20, 2026. It is now read-only.

nlzy/vllm-gfx906

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11,962 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
vLLM for gfx906
===================


ARICHIVED
-------------------

Over the past year, almost all open-weight models have grown increasingly large,
far exceeding my capacity for development and testing. Coupled with the rising
price of MI50 GPUs, I am reluctant to keep adding more cards. I have now
archived this repository; everyone is welcome to fork it.


ORIGINAL README
-------------------

This is a modified version of vLLM, works with (and only works with) AMD gfx906
GPUs such as Radeon VII / Radeon Pro VII / Instinct MI50 / Instinct MI60.

This fork was (and still is) just a passion project shared for fun. I won't be
putting much effort into it. Use it at your own risk, especially please don't
use it as a reference for your GPU purchasing decisions.


RUN WITH DOCKER
-------------------

Please install ROCm 6.3 first, only kernel-mode driver is required. Refer to
the official documentation by AMD.

```
docker pull nalanzeyu/vllm-gfx906
docker run -it --rm --shm-size=2g --device=/dev/kfd --device=/dev/dri \
    --group-add video -p 8000:8000 -v <YOUR_MODEL_PATH>:/model \
    nalanzeyu/vllm-gfx906 vllm serve /model
```


SUPPORT QUANTIZATIONS
-------------------

See #29

GPTQ and AWQ are the first recommended quantization formats.

vLLM's llm-compressor with W4A16 INT format is also recommended. Other formats
in llm-compressor are not support.

All MoE quantization models are significantly slow, and all unquantized models
are slightly slow. Not recommended to use.


BUILD
-------------------

Please install ROCm 6.3 first. You need to install both kernel-mode driver and
ROCm packages. Refer to the official documentation by AMD.

You also need python-venv / python-dev, on Debian / Ubuntu use this command:
$ sudo apt install python3-venv python3-dev

You also need triton-gfx906 v3.5.0+gfx906 see:
https://github.com/nlzy/triton-gfx906/tree/v3.5.0+gfx906

```
cd vllm-gfx906

python3 -m venv vllmenv
source vllmenv/bin/activate

pip3 install torch==2.9 torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.3
pip3 install -r requirements/rocm-build.txt -r requirements/rocm.txt

pip3 install --no-build-isolation --no-deps -v .
```


CREDITS
-------------------

https://github.com/Said-Akbar/vllm-rocm

About

vLLM for AMD gfx906 GPUs, e.g. Radeon VII / MI50 / MI60

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors