GitHub - Q-Future/Q-Ground: Official codes for "Q-Ground: Image Quality Grounding with Large Multi-modality Models", ACM MM2024 (Oral)

Q-Ground: Image Quality Grounding with Large Multi-modality Models

¹Chaofeng Chen, ¹Sensen Yang, ¹Haoning Wu, ¹Liang Liao, ³Zicheng Zhang, ¹AnnanWang, ²Wenxiu Sun, ²Qiong Yan, ¹Weisi Lin
¹S-Lab, Nanyang Technological University, ²Sensetime Research, ³Shanghai Jiao Tong University

Q-Ground is a multimodal model for localizing and describing image quality distortions. This repository provides the released model weights, dataset download utilities, a Gradio demo, evaluation scripts, and multi-stage training scripts used for the released version.

Updates

✅ Release all stage model weights in 🤗Hugging Face Q-Ground Collection
✅ Release test codes
✅ Release training codes
✅ Release datasets in 🤗Hugging Face QGround-100K

Installation

Create a Python environment first. Install a PyTorch build compatible with your CUDA setup before installing the project dependencies.

conda create -n qg python=3.10 -y
conda activate qg

# Install PyTorch and DeepSpeed separately if needed for your machine.
pip install -r requirements.txt

Quick Start: Run the Chat Demo

Launch the Gradio demo:

bash test_chat.sh

This starts chat_app.py, saves predicted masks to ./tmp_chat_masks, and exposes a local Gradio interface with example images from ./example_imgs.

Dataset Preparation

The training pipeline expects datasets under ./dataset. The released dataset snapshot on Hugging Face contains the required data for training and evaluation, organized in a way that the training scripts can directly use after extraction.

Download the dataset

Download the released dataset snapshot from Hugging Face into ./dataset:

python download_datasets.py

If you do not have direct access to Hugging Face, use the mirror endpoint before downloading:

export HF_ENDPOINT=https://hf-mirror.com
python download_datasets.py

Extract archives

Extract all supported archives under ./dataset:

python extract_archives.py --root ./dataset

If you want to inspect or customize label processing, see ./utils_label.py. The released download and extraction flow is intended to perform the required preparation automatically.

Training

The released training pipeline uses four stages. Compared with the paper version, the released model adds a mask decoder pretraining stage before final Q-Ground finetuning.

Stage	Script	Initialization	Purpose
1	`./train_stage1.sh`	`liuhaotian/llava-v1.5-7b`	Align multiscale visual-language features
2	`./train_stage2.sh`	Stage 1 checkpoint	Improve instruction following
3	`./train_stage3.sh`	Stage 2 checkpoint	Pretrain the mask decoder with segmentation-heavy data
4	`./train_stage4.sh`	Stage 3 checkpoint	Finetune on Q-Ground quality grounding data

Run the stages in order using bash train_stage[1-4].sh.

Each stage script launches ./train_ds.py with DeepSpeed and saves a Hugging Face-style checkpoint to a local output directory (for example, ./qg_model_multiscale_stage1 to ./qg_model_multiscale_stage4).

Training Notes

The scripts expect GPUs and a working DeepSpeed installation.
The default dataset root is ./dataset/.
The released Stage 3 mask decoder pretraining noticeably improves mask quality compared with the paper configuration.

Pretrained Weights

Released checkpoints are available in the 🤗 Hugging Face Q-Ground Collection.

The default inference and evaluation scripts load --version chaofengc/qg_model_multiscale. You can override --version in ./evaluate_model.py or adapt the shell scripts if you want to test a local stage checkpoint.

Benchmark Results

Evaluate the released model on the Q-Ground benchmark with:

bash test_bench.sh

The script runs ./evaluate_model.py and stores outputs in ./tmp_qground_results.

Compared with the paper, the extra mask decoder pretraining stage (Stage 3) significantly improves the mask prediction performance (mIoU), which is the key to quality grounding. The performance of the final Q-Ground model on the benchmark compared with the paper is as follows:

Model	Jitter	Noise	Overexposure	Blur	Low light	Average
Paper mIoU	0.434	0.051	0.125	0.460	0.219	0.271
Released mIoU	0.4201	0.1793	0.2756	0.4651	0.2971	0.3274
Paper mAcc	0.720	0.176	0.459	0.648	0.337	0.539
Released mAcc	0.7187	0.4138	0.5172	0.6098	0.4461	0.5411

Citation

If you find this work useful, please consider to cite our paper:

@inproceedings{chen2024qground,
      title={Q-Ground: Image Quality Grounding with Large Multi-modality Models}, 
      author={Chaofeng Chen and Sensen Yang and Haoning Wu and Liang Liao and Zicheng Zhang and Annan Wang and Wenxiu Sun and Qiong Yan and Weisi Lin},
      Journal = {ACM International Conference on Multimedia},
      year={2024},
}

Acknowledgement

This project is based on PixelLM, LISA and LLaVA. Thanks to the authors for their great work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Q-Ground: Image Quality Grounding with Large Multi-modality Models

Updates

Installation

Quick Start: Run the Chat Demo

Dataset Preparation

Download the dataset

Extract archives

Training

Training Notes

Pretrained Weights

Benchmark Results

Citation

Acknowledgement

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
configs		configs
example_imgs		example_imgs
model		model
utils		utils
.gitignore		.gitignore
LICENCE-S-Lab		LICENCE-S-Lab
LICENSE		LICENSE
README.md		README.md
chat_app.py		chat_app.py
chatbot_utils.py		chatbot_utils.py
download_datasets.py		download_datasets.py
evaluate_model.py		evaluate_model.py
extract_archives.py		extract_archives.py
fig_teaser.jpg		fig_teaser.jpg
requirements.txt		requirements.txt
test_bench.sh		test_bench.sh
test_chat.sh		test_chat.sh
train_ds.py		train_ds.py
train_stage1.sh		train_stage1.sh
train_stage2.sh		train_stage2.sh
train_stage3.sh		train_stage3.sh
train_stage4.sh		train_stage4.sh
utils_label.py		utils_label.py

Folders and files

Latest commit

History

Repository files navigation

Q-Ground: Image Quality Grounding with Large Multi-modality Models

Updates

Installation

Quick Start: Run the Chat Demo

Dataset Preparation

Download the dataset

Extract archives

Training

Training Notes

Pretrained Weights

Benchmark Results

Citation

Acknowledgement

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages