M⁴-SAM

"M⁴-SAM: Multi-Modal Mixture-of-Experts with Memory-Augmented SAM for RGB-D Video Salient Object Detection"

by Jiyuan Liu, Jia Lin, Xiaofei Zhou*, Runmin Cong, Deyang Liu, Zhi Liu

🎉 CVPR 2026 Accepted!

📑 Paper (arXiv) | 📄 CVPR Open Access | 💻 Code (GitHub)
⭐ If you find this work helpful, please consider giving us a star!

🧠 Overview

We propose M⁴-SAM, a prompt-free framework that adapts SAM2 for RGB-D video salient object detection by introducing modality-related PEFT, hierarchical feature fusion, and prompt-free memory initialization.

Key Highlights:

💡 Modality-Aware MoE-LoRA: elevates vanilla LoRA with convolutional experts and modality-specific routing for adaptive RGB-D feature fusion and efficient fine-tuning.
🧩 Gated Multi-Level Feature Fusion: hierarchically aggregates multi-scale encoder features with an adaptive gating mechanism to balance spatial details and semantic context.
🚀 Pseudo-Guided Initialization: bootstraps the memory bank using a coarse mask as a pseudo prior, enabling zero-shot VSOD without manual prompts.

⚡ Getting Started

OS/Hardware Compatibility Note:

This codebase was developed and tested exclusively on Ubuntu/Linux. We strongly recommend using a Linux environment.

Please note that slight performance variations may occur due to differences in OS versions, GPU models, and CUDA drivers. We appreciate your understanding.

1. Environment Setup

# Enter the codebase directory
cd M4SAM_Code

# Create and activate conda environment
conda create -n m4sam python==3.10
conda activate m4sam

# Install PyTorch with CUDA support
conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.4 -c pytorch -c nvidia

# Install other dependencies
pip install -r requirements.txt

2. Download SAM2 Pretrained Weights

This downloads sam2.1_hiera_large.pt from Meta AI.

cd checkpoints
bash download_sam_ckpt.sh
cd ..

📊 Reproduce Results

We provide our model checkpoints to help you easily reproduce the performance metrics reported in our paper.

Download Artifacts & Prepare Datasets

Dataset	Source Repo	Checkpoint
RDVS	https://github.com/kerenfu/RDVS	M4SAM-rdvs.pth
ViDSOD-100	https://github.com/jhl-Det/RGBD_Video_SOD	M4SAM-vidsod.pth
DViSal	https://github.com/DVSOD/DVSOD-DViSal	M4SAM-dvisal.pth

Dataset Path Note: Please download the datasets from their official sources linked in the table above. Extract them into a single parent directory (e.g., /data). Your folder structure should look like this:
/data/
├── DViSal_dataset/
│   ├── data/
│   └── test_all.txt
├── RDVS/
│   ├── test/
│   └── train/
└── VidSOD/
    ├── test/
    └── train/
When running evaluation or training, the --test_image_path / --train_image_path argument should point to this parent directory (e.g., /data).

Place the downloaded checkpoints under the checkpoints/ directory:

checkpoints/
├── sam2.1_hiera_large.pt       # SAM2 (from Step 2)
├── M4SAM-dvisal.pth            # DViSal
├── M4SAM-rdvs.pth              # RDVS
└── M4SAM-vidsod.pth            # ViDSOD-100

Verify Results

You can run both inference and evaluation using the following parameterized bash script.

#!/bin/bash
# A quick script to run inference and evaluation

vid_len=16
device=0
dataset="rdvs" # Options: "rdvs", "vidsod", "dvisal"
data_path="/data" # Update this to your local data parent directory
output_dir="./results/${dataset}_pred"

# Set ground truth path based on dataset
if [ "$dataset" = "dvisal" ]; then
    gt_path="${data_path}/DViSal_dataset/data"
elif [ "$dataset" = "rdvs" ]; then
    gt_path="${data_path}/RDVS/test"
elif [ "$dataset" = "vidsod" ]; then
    gt_path="${data_path}/VidSOD/test"
fi

echo "Step 1: Running inference..."
python test.py \
    --vid_len $vid_len \
    --device $device \
    --ckpt checkpoints/M4SAM-${dataset}.pth \
    --test_image_path "$data_path" \
    --dataset $dataset \
    --save_path "$output_dir" \
    --save 1

echo "Step 2: Evaluating..."
python eval_tool.py \
    --dataset $dataset \
    --pred_path "$output_dir" \
    --gt_path "$gt_path"

🏋️ Training

Training uses PyTorch DDP for distributed multi-GPU training.

#!/bin/bash

dataset="rdvs" # Options: "rdvs", "vidsod", "dvisal"
data_path="/data" # Update this to your local data parent directory

# Set epoch based on dataset
if [ "$dataset" = "dvisal" ]; then
    epoch=50
elif [ "$dataset" = "rdvs" ]; then
    epoch=60
elif [ "$dataset" = "vidsod" ]; then
    epoch=30
fi

python train_ddp.py \
    --batch_size 4 \
    --device 0,1 \
    --epoch $epoch \
    --vid_len 4 \
    --conti 0 \
    --lr 0.001 \
    --sync_bn 1 \
    --dataset $dataset \
    --train_image_path "$data_path"

Acknowledgement

Our work would not have been possible without the following open-source projects:

Thanks for their great contributions!

Citation

If you find our work useful, please cite our paper, thank you!

@InProceedings{Liu_2026_CVPR,
    author    = {Liu, Jiyuan and Lin, Jia and Zhou, Xiaofei and Cong, Runmin and Liu, Deyang and Liu, Zhi},
    title     = {M4-SAM: Multi-Modal Mixture-of-Experts with Memory-Augmented SAM for RGB-D Video Salient Object Detection},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2026},
    pages     = {24970-24979}
}

License

This project is licensed under the CC BY-NC 4.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
M4SAM_Code		M4SAM_Code
assets		assets
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

M⁴-SAM

🧠 Overview

⚡ Getting Started

1. Environment Setup

2. Download SAM2 Pretrained Weights

📊 Reproduce Results

Download Artifacts & Prepare Datasets

Verify Results

🏋️ Training

Acknowledgement

Citation

License

About

Uh oh!

Releases 1

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

M⁴-SAM

🧠 Overview

⚡ Getting Started

1. Environment Setup

2. Download SAM2 Pretrained Weights

📊 Reproduce Results

Download Artifacts & Prepare Datasets

Verify Results

🏋️ Training

Acknowledgement

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors

Uh oh!

Languages