Code accompanying the paper "When Is Diversity Rewarded in Cooperative Multi-Agent Learning?" presented at ICLR 2026.
@inproceedings{bettini2026hetenvdesign,
title={When Is Diversity Rewarded in Cooperative Multi-Agent Learning?},
author={Amir, Michael and Bettini, Matteo and Prorok, Amanda},
year={2026},
booktitle={International Conference on Learning Representations (ICLR)},
}This supplementary materials contains our code, and configuration files that enable one to replicate all of our experiments and figures exactly as they appear in the paper.
- Create a virtual environment with python 3.11 (e.g.,
conda create --name env python=3.11with conda) - Install dependencies
pip install torch==2.5 hydra-core torch_geometric wandb moviepy matplotlib==3.8- Install our versions of
VMAS,TensorDict, andTorchRL.
git clone -b het_env_design https://github.com/proroklab/VectorizedMultiAgentSimulator.git
pip install -e VectorizedMultiAgentSimulator
git clone -b het_env_design https://github.com/matteobettini/tensordict.git
cd tensordict
python setup.py develop
cd ..
git clone -b het_env_design https://github.com/matteobettini/rl.git
cd rl
python setup.py develop
cd ..
- Install optional dependencies for logging
pip installl wandb moviepy- Try running a script (if you have a gpu, it will use it)
python HetEnvDesign/matrix_game_cont.py Each experiment in the paper has a corresponding python file and yaml configuration folder (under the conf folder), containing its hyper-parameters.
Here is how to reproduce the experiments (any parameters in the corresponding folder can be passed).
-m performs a multirun, see the hydra docs for more info.
This is the ``matrix game'' considered in our experiments section.
Continuous:
python HetEnvDesign/matrix_game_cont.py -m env.scenario.gen_agg_type_task=min,mean,max env.scenario.gen_agg_type_agent=min,mean,max seed=0,1,2,3,4,5,6,7,8Discrete:
python HetEnvDesign/matrix_game_disc.py -m env.scenario.gen_agg_type_task=min,mean,max env.scenario.gen_agg_type_agent=min,mean,max seed=0,1,2,3,4,5,6,7,8Once run, you can plot the results from wandb (it will create the pdf used in the paper)
python HetEnvDesign/plotting/plot_matrix_games.pyThis is the embodied environment studied in our experiments section.
python HetEnvDesign/goal_navigation_vanilla.py -m env.scenario.gen_agg_type_task=min,mean,max env.scenario.gen_agg_type_agent=min,mean,max seed=0,1,2,3,4,5,6,7,8Once run, you can plot the results from wandb (it will create the pdf used in the paper)
python HetEnvDesign/plotting/plot_ctf_embodied_vanilla.pyThis runs the experiments where agents are equipped with a range sensor of increasing radius.
python HetEnvDesign/goal_navigation_lidar.py -m env.scenario.lidar_range=0,0.1,0.2,0.35 seed=0,1,2,3Once run, you can plot the results from wandb (it will create the pdf used in the paper)
python HetEnvDesign/plotting/plot_ctf_embodied_lidar.pyThese are the co-design HED experiments where HED optimizes the reward structure to favour heterogeneity.
Softmax:
python HetEnvDesign/goal_navigation_hed_softmax.py -m seed=0,1,2,3,4,5,6,7,8,9,10,11,12Power-Sum:
python HetEnvDesign/goal_navigation_hed_powersum.py -m seed=0,1,2,3,4,5,6,7,8,9,10,11,12Once run, you can plot the results from wandb (it will create the pdf used in the paper)
python HetEnvDesign/plotting/plot_ctf_embodied_softmax_design.py
python HetEnvDesign/plotting/plot_ctf_embodied_powersum_design.pyThe configuration for the codebase is available in the HetEnvDesign/conf folder.
Each parameter can be changed from the yaml file or from command line using the hydra syntax, see the hydra docs for more info.