Skip to content

iLearn-Lab/CVPR26-OptimusVLA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Global Prior Meets Local Consistency: Dual-Memory Augmented
Vision-Language-Action Model for Efficient Robotic Manipulation
CVPR 2026

1Harbin Institute of Technology, Shenzhen    2PengCheng Laboratory, Shenzhen   
3Shenzhen Loop Area Institute    4Huawei Noah's Ark Lab   
✉ Corresponding author  

🆕 Updates

🎈 OptimusVLA Framework

Overview of OptimusVLA framework. Given a task and the current observation, the Vision–Language backbone first encodes the inputs into a multimodal representation. GPM then retrieves a task-level prior based on this representation, while LBM dynamically encodes the historical action sequence to produce a consistency constraint. Finally, the flow policy denoises the initialization with an adaptive NFEs schedule to generate the action chunk.

🚀 How to Run

OptimusVLA is built on the openpi framework. Therefore, please first download and configure the openpi environment, and then download the pi_05 model weights.

Install openpi

  1. Clone the official OpenPI repository
git clone --recurse-submodules git@github.com:Physical-Intelligence/openpi.git
cd openpi
  1. Create the main OpenPI environment:
GIT_LFS_SKIP_SMUDGE=1 uv sync
GIT_LFS_SKIP_SMUDGE=1 uv pip install -e .
  1. PyTorch Support
cp -r ./src/openpi/models_pytorch/transformers_replace/* .venv/lib/python3.11/site-packages/transformers/
  1. Download pi05_libero model checkpoints and convert it to pytorch version

OptimusVLA does not release the pi0.5 policy checkpoint. Prepare a PyTorch pi05_libero checkpoint yourself. The policy directory must contain:

model.safetensors
assets/physical-intelligence/libero/norm_stats.json

If you start from a JAX OpenPI checkpoint, convert it with the upstream OpenPI converter:

cd "${OPENPI_ROOT}"
uv run examples/convert_jax_model_to_pytorch.py \
  --checkpoint-dir /path/to/pi05_libero_jax_checkpoint \
  --config-name pi05_libero \
  --output-path /path/to/pi05_libero_pytorch

Set the policy path:

export POLICY_DIR=/path/to/pi05_libero_pytorch
  1. Install the extra OptimusVLA inference dependency:
uv pip install faiss-cpu
uv pip install mamba-ssm
  1. Create the LIBERO Client Environment

Create the LIBERO example environment using the official OpenPI instructions:

cd "${OPENPI_ROOT}"
uv venv --python 3.8 examples/libero/.venv
source examples/libero/.venv/bin/activate
uv pip sync examples/libero/requirements.txt third_party/libero/requirements.txt \
  --extra-index-url https://download.pytorch.org/whl/cu113 \
  --index-strategy=unsafe-best-match
uv pip install -e packages/openpi-client
uv pip install -e third_party/libero
deactivate

Use this environment only for the LIBERO client in examples/libero/main.py. Run the policy server from the OpenPI server environment created in step 2.

Apply the OptimusVLA Code

Clone or download the GitHub repository that contains this code/ folder, then copy the overlay files from code/ into an already configured OpenPI checkout:

git clone https://github.com/iLearn-Lab/CVPR26-OptimusVLA.git
cd CVPR26-OptimusVLA
export OPENPI_ROOT=/path/to/openpi
rsync -av \
  code/ "${OPENPI_ROOT}/"

This step only copies the source overlay. Download checkpoints/ and memory/ separately from the Hugging Face asset repository in the next step.

Download OptimusVLA Assets

Download the asset directories from the OptimusVLA_Memory. They must end up under the OpenPI root exactly as follows:

${OPENPI_ROOT}/checkpoints/gpm_task_head.pt
${OPENPI_ROOT}/checkpoints/lcm.pt
${OPENPI_ROOT}/memory/gpm_memory_meta.pt
${OPENPI_ROOT}/memory/gpm_memory.index
${OPENPI_ROOT}/memory/gpm_memory_actions.npz

Run LIBERO Evaluation

cd "${OPENPI_ROOT}"
POLICY_DIR=/path/to/pi05_libero_pytorch bash scripts/run_libero_eval.sh

By default, the script runs libero_spatial, libero_object, libero_goal, and libero_10. Logs are written to a timestamped directory under logs/, and the final summary is saved as results.txt in that directory. The summary contains the exit code, episode count, success count, and success rate for each suite.

Default model and server settings:

Parameter Default Meaning
POLICY_CONFIG pi05_libero OpenPI policy config used for the base pi0.5 checkpoint.
POLICY_DIR Required Local PyTorch pi0.5 checkpoint directory. This release does not include it.
ACTION_NORM_STATS_PATH ${POLICY_DIR}/assets/physical-intelligence/libero/norm_stats.json Action normalization stats used by GPM memory actions.
HOST / PORT 127.0.0.1 / 8000 Client connection target and server port.
SERVER_CUDA_VISIBLE_DEVICES ${CUDA_VISIBLE_DEVICES} or 0 GPU used by the policy server.
CLIENT_CUDA_VISIBLE_DEVICES ${CUDA_VISIBLE_DEVICES} or 0 GPU visible to the LIBERO clients.
OPENPI_TORCH_COMPILE 0 Disables Torch compile by default for easier first-run debugging.

Default GPM settings:

Parameter Default Meaning
GPM memory Enabled The helper always passes --use-memory.
GPM assets checkpoints/gpm_task_head.pt, memory/gpm_memory_meta.pt, memory/gpm_memory.index, memory/gpm_memory_actions.npz Default task head, metadata, FAISS index, and packed action memory paths.
--action-use-quantile-norm Enabled Uses LIBERO action quantile statistics for memory action normalization.
MEMORY_TOP_K 8 Number of retrieved memory candidates per query.
MEMORY_REFRESH_EVERY 1 Refreshes memory retrieval every replan request.
ALIGN_MODE hybrid Time alignment mode for the retrieved memory trajectory.
MIXTURE_MODE gaussian Builds a Gaussian action prior from retrieved memories.
TEMPERATURE 10.0 Softmax temperature used when weighting retrieved memories.
SIGMA_MIN 0.05 Lower bound for the Gaussian prior standard deviation.
NOISE_MIN / NOISE_MAX 0.20 / 1.00 Noise schedule range used by the memory-guided sampler.
NFE_MIN / NFE_MAX 1 / 10 Sampling step range selected from memory confidence.

Default LCM settings:

Parameter Default Meaning
USE_LCM 1 Enables LCM refinement after GPM. Set USE_LCM=0 for GPM-only inference.
LCM_SCALE 0.10 Strength of the LCM correction applied to the action chunk.
LCM checkpoint checkpoints/lcm.pt Default LCM checkpoint path.
LCM architecture fallback hidden 256, layers 1, heads 4, dropout 0.0, mamba_impl=auto Used only when these fields are absent from checkpoint metadata.

Default evaluation and logging settings:

Parameter Default Meaning
Suites libero_spatial, libero_object, libero_goal, libero_10 Four standard LIBERO suites run in parallel.
NUM_TRIALS_PER_TASK 50 Episodes evaluated per task.
REPLAN_STEPS 10 Number of actions executed before requesting a new chunk.
NUM_STEPS_WAIT 10 Initial dummy steps before policy control begins.
SEED 7 LIBERO environment seed.
RESIZE_SIZE 224 Image size sent to the policy client.
MUJOCO_GL egl Headless MuJoCo rendering backend.
LOG_DIR logs/libero_eval_<timestamp> Directory for server logs, client stdout logs, JSONL records, and results.txt.
RESULTS_TXT ${LOG_DIR}/results.txt Final text summary with per-suite and overall success rates.

Experimental Results Note

Our repeated experiments show that performance on LIBERO is affected by the GPU and randomness. Below, we provide the experimental results obtained on L40 and A800 GPUs using the default parameter settings for reference.

Method Spatial Object Goal Long Average
pi_05 98.8 98.2 98.0 92.4 96.9
L40 99.4 99.2 98.8 95.6 98.3
98.6 99.8 97.8 94.6 97.7
97.8 99.6 98.6 92.8 97.2
98.4 99.0 98.0 95.2 97.7
98.6 99.4 98.0 94.6 97.7
98.8 98.8 97.2 95.2 97.5
98.4 99.8 97.4 94.0 97.4
98.4 99.4 97.6 95.0 97.6
A800 97.6 99.4 96.8 95.6 97.4
97.8 99.4 98.2 95.2 97.7
99.6 99.2 99.0 94.4 98.1
98.4 99.6 97.8 94.8 97.7
99.0 99.6 97.0 94.2 97.5
99.2 99.2 97.4 94.0 97.5
98.6 99.8 98.8 96.4 98.4
99.2 99.2 98.4 94.0 97.7
Reported in Paper 99.6 99.8 98.4 96.4 98.6

😸 Evaluation results on Real World

We evaluate OptimusVLA on Generalization Tasks and Long-horizon Tasks via GALAXEA R1 Lite robot.

🤗 Citation

If you find this work useful for your research, please kindly cite our paper:

@article{li2026optimusvla,
  title={Global Prior Meets Local Consistency: Dual-Memory Augmented Vision-Language-Action Model for Efficient Robotic Manipulation},
  author={Zaijing Li, Bing Hu, Rui Shao, Gongwei Chen, Dongmei Jiang, Pengwei Xie, Jianye Hao, Liqiang Nie},
  journal={arXiv preprint arXiv:2602.20200},
  year={2026}
}

About

[CVPR 2026] Official Implementation for Global Prior Meets Local Consistency: Dual-Memory Augmented Vision-Language-Action Model for Efficient Robotic Manipulation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors