🤖 Imitation Policy Minimal

Are you tired of constantly switching between code files, papers, and tutorials?
Frustrated by endless dependency installations and environment conflicts—when all you want is to run simple demos and learn from the code?

Imitation Policy Minimal is a clean, educational implementation of imitation learning-based policies for embodied AI.
This project integrates both Diffusion Policy and Flow Matching for simple control tasks like Pendulum-v1.

📦 Installation

Clone the repository

git clone https://github.com/ZidongChen25/Imitation_Policy_Minimal.git
cd Imitation_Policy_Minimal

Create a Python environment (optional but recommended)

conda create -n diffusion_policy_minimal python=3.10
conda activate diffusion_policy_minimal

Install required packages

pip install torch gymnasium tensorboard stable-baselines3 numpy pygame

Users planning to leverage GPU acceleration should install the appropriate CUDA-enabled build of PyTorch (e.g., pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118) or follow PyTorch’s official installation instructions for their CUDA version. For CPU-only use, the standard PyPI build of torch is sufficient.

🚀 How to Use

Imitation learning policy requires expert demonstrations, for example, human demonstration or we can train a RL policy such as PPO, SAC, DDPG. In this easy environment Pendulum, we use PPO to train an expert policy.

Generate expert demonstrations:
```
python expert_policy.py
```
Create demonstrations:
```
python generate_demonstration.py
```
This will create an expert_demo.npz file containing expert trajectories.
(Optional) Check how the expert policy is doing:
```
python policy_visualization.py
```

Train the policy:

python diffusion_policy.py --mode train  
# or
python flow_matching.py --mode train

Training logs will be saved automatically and can be visualized via TensorBoard.

This is an easy task with a lightweight model, and it can be trained within 5 minutes on an MacBook Air.

Evaluate The policy:

# Diffusion Policy
python diffusion_policy.py --mode inference_rgb_array  # Outputs average reward over 5 episodes
python diffusion_policy.py --mode inference_human       # Visualizes 1 episode

# Flow Matching Policy
python flow_matching.py --mode inference_rgb_array      
python flow_matching.py --mode inference_human

References:

Original paper: Diffusion Policy, Flow matching
Environment: Pendulum

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
__pycache__		__pycache__
logs		logs
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
diffusion_policy.py		diffusion_policy.py
diffusion_policy_eval.py		diffusion_policy_eval.py
evaluate.py		evaluate.py
expert_demo.npz		expert_demo.npz
expert_policy.py		expert_policy.py
flow_matching.py		flow_matching.py
generate_demonstration.py		generate_demonstration.py
plot.py		plot.py
policy_visualization.py		policy_visualization.py
ppo_pendulum_expert.zip		ppo_pendulum_expert.zip
test.py		test.py
visual.py		visual.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 Imitation Policy Minimal

📦 Installation

Clone the repository

Create a Python environment (optional but recommended)

Install required packages

🚀 How to Use

References:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🤖 Imitation Policy Minimal

📦 Installation

Clone the repository

Create a Python environment (optional but recommended)

Install required packages

🚀 How to Use

References:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages