Are you tired of constantly switching between code files, papers, and tutorials?
Frustrated by endless dependency installations and environment conflicts—when all you want is to run simple demos and learn from the code?
Imitation Policy Minimal is a clean, educational implementation of imitation learning-based policies for embodied AI.
This project integrates both Diffusion Policy and Flow Matching for simple control tasks like Pendulum-v1.
git clone https://github.com/ZidongChen25/Imitation_Policy_Minimal.git
cd Imitation_Policy_Minimalconda create -n diffusion_policy_minimal python=3.10
conda activate diffusion_policy_minimalpip install torch gymnasium tensorboard stable-baselines3 numpy pygameUsers planning to leverage GPU acceleration should install the appropriate CUDA-enabled build of PyTorch (e.g., pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118) or follow PyTorch’s official installation instructions for their CUDA version. For CPU-only use, the standard PyPI build of torch is sufficient.
Imitation learning policy requires expert demonstrations, for example, human demonstration or we can train a RL policy such as PPO, SAC, DDPG. In this easy environment Pendulum, we use PPO to train an expert policy.
-
Generate expert demonstrations:
python expert_policy.py
-
Create demonstrations:
python generate_demonstration.py
This will create an
expert_demo.npzfile containing expert trajectories. -
(Optional) Check how the expert policy is doing:
python policy_visualization.py
-
Train the policy:
python diffusion_policy.py --mode train # or python flow_matching.py --mode train
Training logs will be saved automatically and can be visualized via TensorBoard.
This is an easy task with a lightweight model, and it can be trained within 5 minutes on an MacBook Air.
- Evaluate The policy:
# Diffusion Policy python diffusion_policy.py --mode inference_rgb_array # Outputs average reward over 5 episodes python diffusion_policy.py --mode inference_human # Visualizes 1 episode # Flow Matching Policy python flow_matching.py --mode inference_rgb_array python flow_matching.py --mode inference_human
- Original paper: Diffusion Policy, Flow matching
- Environment: Pendulum