Global Prior Meets Local Consistency: Dual-Memory Augmented
Vision-Language-Action Model for Efficient Robotic Manipulation
CVPR 2026
Pengwei Xie4, Jianye HAO4, Liqiang Nie1
3Shenzhen Loop Area Institute 4Huawei Noah's Ark Lab
✉ Corresponding author
- [05/2026] 🔥 We release the Inference Code and Checkpoints on LIBERO.
- [02/2026] 🔥 OptimusVLA is accepted to CVPR 2026!
- [02/2026] 🔥 Project page released.
- [02/2026] 🔥 Arxiv paper released.
Overview of OptimusVLA framework. Given a task and the current observation, the Vision–Language backbone first encodes the inputs into a multimodal representation. GPM then retrieves a task-level prior based on this representation, while LBM dynamically encodes the historical action sequence to produce a consistency constraint. Finally, the flow policy denoises the initialization with an adaptive NFEs schedule to generate the action chunk.

OptimusVLA is built on the openpi framework. Therefore, please first download and configure the openpi environment, and then download the pi_05 model weights.
- Clone the official OpenPI repository
git clone --recurse-submodules git@github.com:Physical-Intelligence/openpi.git
cd openpi- Create the main OpenPI environment:
GIT_LFS_SKIP_SMUDGE=1 uv sync
GIT_LFS_SKIP_SMUDGE=1 uv pip install -e .- PyTorch Support
cp -r ./src/openpi/models_pytorch/transformers_replace/* .venv/lib/python3.11/site-packages/transformers/- Download pi05_libero model checkpoints and convert it to pytorch version
OptimusVLA does not release the pi0.5 policy checkpoint. Prepare a PyTorch
pi05_libero checkpoint yourself. The policy directory must contain:
model.safetensors
assets/physical-intelligence/libero/norm_stats.json
If you start from a JAX OpenPI checkpoint, convert it with the upstream OpenPI converter:
cd "${OPENPI_ROOT}"
uv run examples/convert_jax_model_to_pytorch.py \
--checkpoint-dir /path/to/pi05_libero_jax_checkpoint \
--config-name pi05_libero \
--output-path /path/to/pi05_libero_pytorchSet the policy path:
export POLICY_DIR=/path/to/pi05_libero_pytorch- Install the extra OptimusVLA inference dependency:
uv pip install faiss-cpu
uv pip install mamba-ssm- Create the LIBERO Client Environment
Create the LIBERO example environment using the official OpenPI instructions:
cd "${OPENPI_ROOT}"
uv venv --python 3.8 examples/libero/.venv
source examples/libero/.venv/bin/activate
uv pip sync examples/libero/requirements.txt third_party/libero/requirements.txt \
--extra-index-url https://download.pytorch.org/whl/cu113 \
--index-strategy=unsafe-best-match
uv pip install -e packages/openpi-client
uv pip install -e third_party/libero
deactivateUse this environment only for the LIBERO client in
examples/libero/main.py. Run the policy server from the OpenPI server
environment created in step 2.
Clone or download the GitHub repository that contains this code/ folder, then
copy the overlay files from code/ into an already configured OpenPI checkout:
git clone https://github.com/iLearn-Lab/CVPR26-OptimusVLA.git
cd CVPR26-OptimusVLA
export OPENPI_ROOT=/path/to/openpi
rsync -av \
code/ "${OPENPI_ROOT}/"This step only copies the source overlay. Download checkpoints/ and memory/
separately from the Hugging Face asset repository in the next step.
Download the asset directories from the OptimusVLA_Memory. They must end up under the OpenPI root exactly as follows:
${OPENPI_ROOT}/checkpoints/gpm_task_head.pt
${OPENPI_ROOT}/checkpoints/lcm.pt
${OPENPI_ROOT}/memory/gpm_memory_meta.pt
${OPENPI_ROOT}/memory/gpm_memory.index
${OPENPI_ROOT}/memory/gpm_memory_actions.npz
cd "${OPENPI_ROOT}"
POLICY_DIR=/path/to/pi05_libero_pytorch bash scripts/run_libero_eval.shBy default, the script runs libero_spatial, libero_object, libero_goal,
and libero_10. Logs are written to a timestamped directory under logs/,
and the final summary is saved as results.txt in that directory. The summary
contains the exit code, episode count, success count, and success rate for each
suite.
Default model and server settings:
| Parameter | Default | Meaning |
|---|---|---|
POLICY_CONFIG |
pi05_libero |
OpenPI policy config used for the base pi0.5 checkpoint. |
POLICY_DIR |
Required | Local PyTorch pi0.5 checkpoint directory. This release does not include it. |
ACTION_NORM_STATS_PATH |
${POLICY_DIR}/assets/physical-intelligence/libero/norm_stats.json |
Action normalization stats used by GPM memory actions. |
HOST / PORT |
127.0.0.1 / 8000 |
Client connection target and server port. |
SERVER_CUDA_VISIBLE_DEVICES |
${CUDA_VISIBLE_DEVICES} or 0 |
GPU used by the policy server. |
CLIENT_CUDA_VISIBLE_DEVICES |
${CUDA_VISIBLE_DEVICES} or 0 |
GPU visible to the LIBERO clients. |
OPENPI_TORCH_COMPILE |
0 |
Disables Torch compile by default for easier first-run debugging. |
Default GPM settings:
| Parameter | Default | Meaning |
|---|---|---|
| GPM memory | Enabled | The helper always passes --use-memory. |
| GPM assets | checkpoints/gpm_task_head.pt, memory/gpm_memory_meta.pt, memory/gpm_memory.index, memory/gpm_memory_actions.npz |
Default task head, metadata, FAISS index, and packed action memory paths. |
--action-use-quantile-norm |
Enabled | Uses LIBERO action quantile statistics for memory action normalization. |
MEMORY_TOP_K |
8 |
Number of retrieved memory candidates per query. |
MEMORY_REFRESH_EVERY |
1 |
Refreshes memory retrieval every replan request. |
ALIGN_MODE |
hybrid |
Time alignment mode for the retrieved memory trajectory. |
MIXTURE_MODE |
gaussian |
Builds a Gaussian action prior from retrieved memories. |
TEMPERATURE |
10.0 |
Softmax temperature used when weighting retrieved memories. |
SIGMA_MIN |
0.05 |
Lower bound for the Gaussian prior standard deviation. |
NOISE_MIN / NOISE_MAX |
0.20 / 1.00 |
Noise schedule range used by the memory-guided sampler. |
NFE_MIN / NFE_MAX |
1 / 10 |
Sampling step range selected from memory confidence. |
Default LCM settings:
| Parameter | Default | Meaning |
|---|---|---|
USE_LCM |
1 |
Enables LCM refinement after GPM. Set USE_LCM=0 for GPM-only inference. |
LCM_SCALE |
0.10 |
Strength of the LCM correction applied to the action chunk. |
| LCM checkpoint | checkpoints/lcm.pt |
Default LCM checkpoint path. |
| LCM architecture fallback | hidden 256, layers 1, heads 4, dropout 0.0, mamba_impl=auto |
Used only when these fields are absent from checkpoint metadata. |
Default evaluation and logging settings:
| Parameter | Default | Meaning |
|---|---|---|
| Suites | libero_spatial, libero_object, libero_goal, libero_10 |
Four standard LIBERO suites run in parallel. |
NUM_TRIALS_PER_TASK |
50 |
Episodes evaluated per task. |
REPLAN_STEPS |
10 |
Number of actions executed before requesting a new chunk. |
NUM_STEPS_WAIT |
10 |
Initial dummy steps before policy control begins. |
SEED |
7 |
LIBERO environment seed. |
RESIZE_SIZE |
224 |
Image size sent to the policy client. |
MUJOCO_GL |
egl |
Headless MuJoCo rendering backend. |
LOG_DIR |
logs/libero_eval_<timestamp> |
Directory for server logs, client stdout logs, JSONL records, and results.txt. |
RESULTS_TXT |
${LOG_DIR}/results.txt |
Final text summary with per-suite and overall success rates. |
Our repeated experiments show that performance on LIBERO is affected by the GPU and randomness. Below, we provide the experimental results obtained on L40 and A800 GPUs using the default parameter settings for reference.
| Method | Spatial | Object | Goal | Long | Average |
|---|---|---|---|---|---|
| pi_05 | 98.8 | 98.2 | 98.0 | 92.4 | 96.9 |
| L40 | 99.4 | 99.2 | 98.8 | 95.6 | 98.3 |
| 98.6 | 99.8 | 97.8 | 94.6 | 97.7 | |
| 97.8 | 99.6 | 98.6 | 92.8 | 97.2 | |
| 98.4 | 99.0 | 98.0 | 95.2 | 97.7 | |
| 98.6 | 99.4 | 98.0 | 94.6 | 97.7 | |
| 98.8 | 98.8 | 97.2 | 95.2 | 97.5 | |
| 98.4 | 99.8 | 97.4 | 94.0 | 97.4 | |
| 98.4 | 99.4 | 97.6 | 95.0 | 97.6 | |
| A800 | 97.6 | 99.4 | 96.8 | 95.6 | 97.4 |
| 97.8 | 99.4 | 98.2 | 95.2 | 97.7 | |
| 99.6 | 99.2 | 99.0 | 94.4 | 98.1 | |
| 98.4 | 99.6 | 97.8 | 94.8 | 97.7 | |
| 99.0 | 99.6 | 97.0 | 94.2 | 97.5 | |
| 99.2 | 99.2 | 97.4 | 94.0 | 97.5 | |
| 98.6 | 99.8 | 98.8 | 96.4 | 98.4 | |
| 99.2 | 99.2 | 98.4 | 94.0 | 97.7 | |
| Reported in Paper | 99.6 | 99.8 | 98.4 | 96.4 | 98.6 |
We evaluate OptimusVLA on Generalization Tasks and Long-horizon Tasks via GALAXEA R1 Lite robot.

If you find this work useful for your research, please kindly cite our paper:
@article{li2026optimusvla,
title={Global Prior Meets Local Consistency: Dual-Memory Augmented Vision-Language-Action Model for Efficient Robotic Manipulation},
author={Zaijing Li, Bing Hu, Rui Shao, Gongwei Chen, Dongmei Jiang, Pengwei Xie, Jianye Hao, Liqiang Nie},
journal={arXiv preprint arXiv:2602.20200},
year={2026}
}