Skip to content

HankLiu2020/M4-SAM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

M⁴-SAM

"M⁴-SAM: Multi-Modal Mixture-of-Experts with Memory-Augmented SAM for RGB-D Video Salient Object Detection"

by Jiyuan Liu, Jia Lin, Xiaofei Zhou*, Runmin Cong, Deyang Liu, Zhi Liu

πŸŽ‰ CVPR 2026 Accepted!

πŸ“‘ Paper (arXiv) | πŸ“„ CVPR Open Access | πŸ’» Code (GitHub)
⭐ If you find this work helpful, please consider giving us a star!

🧠 Overview

We propose M⁴-SAM, a prompt-free framework that adapts SAM2 for RGB-D video salient object detection by introducing modality-related PEFT, hierarchical feature fusion, and prompt-free memory initialization.

Key Highlights:

  • πŸ’‘ Modality-Aware MoE-LoRA: elevates vanilla LoRA with convolutional experts and modality-specific routing for adaptive RGB-D feature fusion and efficient fine-tuning.
  • 🧩 Gated Multi-Level Feature Fusion: hierarchically aggregates multi-scale encoder features with an adaptive gating mechanism to balance spatial details and semantic context.
  • πŸš€ Pseudo-Guided Initialization: bootstraps the memory bank using a coarse mask as a pseudo prior, enabling zero-shot VSOD without manual prompts.

⚑ Getting Started

OS/Hardware Compatibility Note:

This codebase was developed and tested exclusively on Ubuntu/Linux. We strongly recommend using a Linux environment.

Please note that slight performance variations may occur due to differences in OS versions, GPU models, and CUDA drivers. We appreciate your understanding.

1. Environment Setup

# Enter the codebase directory
cd M4SAM_Code

# Create and activate conda environment
conda create -n m4sam python==3.10
conda activate m4sam

# Install PyTorch with CUDA support
conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.4 -c pytorch -c nvidia

# Install other dependencies
pip install -r requirements.txt

2. Download SAM2 Pretrained Weights

This downloads sam2.1_hiera_large.pt from Meta AI.

cd checkpoints
bash download_sam_ckpt.sh
cd ..

πŸ“Š Reproduce Results

We provide our model checkpoints to help you easily reproduce the performance metrics reported in our paper.

Download Artifacts & Prepare Datasets

Dataset Source Repo Checkpoint
RDVS https://github.com/kerenfu/RDVS M4SAM-rdvs.pth
ViDSOD-100 https://github.com/jhl-Det/RGBD_Video_SOD M4SAM-vidsod.pth
DViSal https://github.com/DVSOD/DVSOD-DViSal M4SAM-dvisal.pth

Dataset Path Note: Please download the datasets from their official sources linked in the table above. Extract them into a single parent directory (e.g., /data). Your folder structure should look like this:

/data/
β”œβ”€β”€ DViSal_dataset/
β”‚   β”œβ”€β”€ data/
β”‚   └── test_all.txt
β”œβ”€β”€ RDVS/
β”‚   β”œβ”€β”€ test/
β”‚   └── train/
└── VidSOD/
    β”œβ”€β”€ test/
    └── train/

When running evaluation or training, the --test_image_path / --train_image_path argument should point to this parent directory (e.g., /data).

Place the downloaded checkpoints under the checkpoints/ directory:

checkpoints/
β”œβ”€β”€ sam2.1_hiera_large.pt       # SAM2 (from Step 2)
β”œβ”€β”€ M4SAM-dvisal.pth            # DViSal
β”œβ”€β”€ M4SAM-rdvs.pth              # RDVS
└── M4SAM-vidsod.pth            # ViDSOD-100

Verify Results

You can run both inference and evaluation using the following parameterized bash script.

#!/bin/bash
# A quick script to run inference and evaluation

vid_len=16
device=0
dataset="rdvs" # Options: "rdvs", "vidsod", "dvisal"
data_path="/data" # Update this to your local data parent directory
output_dir="./results/${dataset}_pred"

# Set ground truth path based on dataset
if [ "$dataset" = "dvisal" ]; then
    gt_path="${data_path}/DViSal_dataset/data"
elif [ "$dataset" = "rdvs" ]; then
    gt_path="${data_path}/RDVS/test"
elif [ "$dataset" = "vidsod" ]; then
    gt_path="${data_path}/VidSOD/test"
fi

echo "Step 1: Running inference..."
python test.py \
    --vid_len $vid_len \
    --device $device \
    --ckpt checkpoints/M4SAM-${dataset}.pth \
    --test_image_path "$data_path" \
    --dataset $dataset \
    --save_path "$output_dir" \
    --save 1

echo "Step 2: Evaluating..."
python eval_tool.py \
    --dataset $dataset \
    --pred_path "$output_dir" \
    --gt_path "$gt_path"

πŸ‹οΈ Training

Training uses PyTorch DDP for distributed multi-GPU training.

#!/bin/bash

dataset="rdvs" # Options: "rdvs", "vidsod", "dvisal"
data_path="/data" # Update this to your local data parent directory

# Set epoch based on dataset
if [ "$dataset" = "dvisal" ]; then
    epoch=50
elif [ "$dataset" = "rdvs" ]; then
    epoch=60
elif [ "$dataset" = "vidsod" ]; then
    epoch=30
fi

python train_ddp.py \
    --batch_size 4 \
    --device 0,1 \
    --epoch $epoch \
    --vid_len 4 \
    --conti 0 \
    --lr 0.001 \
    --sync_bn 1 \
    --dataset $dataset \
    --train_image_path "$data_path"

Acknowledgement

Our work would not have been possible without the following open-source projects:

Thanks for their great contributions!

Citation

If you find our work useful, please cite our paper, thank you!

@InProceedings{Liu_2026_CVPR,
    author    = {Liu, Jiyuan and Lin, Jia and Zhou, Xiaofei and Cong, Runmin and Liu, Deyang and Liu, Zhi},
    title     = {M4-SAM: Multi-Modal Mixture-of-Experts with Memory-Augmented SAM for RGB-D Video Salient Object Detection},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2026},
    pages     = {24970-24979}
}

License

This project is licensed under the CC BY-NC 4.0 License.

About

[CVPR 2026πŸŽ‰Accepted] Official code repository for "M⁴-SAM: Multi-modal Mixture-of-Experts with Memory-Augmented SAM for RGB-D Video Salient Object Detection"

Resources

License

Stars

Watchers

Forks

Contributors