Acme currently lacks first-class support for Gymnasium environments (the successor to OpenAI Gym). This issue proposes implementing a new wrapper (acme/wrappers/gymnasium_wrapper.py) or significantly refactoring the existing gym_wrapper.py to correctly support Gymnasium’s updated APIs, semantics, and space definitions.
This change is required for compatibility with modern RL environments and reproducible experimentation.
📌 Motivation
Gymnasium has breaking API changes compared to legacy Gym (reset/step semantics, seeding, spaces).
Many Acme users now rely on Gymnasium-based environments.
A robust wrapper enables seamless interoperability between Gymnasium → dm_env → Acme.
-
Dependency Management
Gymnasium must be imported safely (guarded import).
The wrapper should not break Acme installations where Gymnasium is not installed.
Gymnasium should be treated as an optional dependency.
-
Space Conversion
Acme uses dm_env.specs, while Gymnasium uses gymnasium.spaces.
-
Reset Semantics
Gymnasium reset() returns (observation, info).
Legacy Gym returned only observation.
The wrapper should:
Capture (obs, info)
Optionally store info in TimeStep.extras
Return dm_env.restart(obs)
-
Step Semantics
Gymnasium step() returns:
(obs, reward, terminated, truncated, info)
This must be mapped to:
dm_env.TimeStep(step_type, reward, discount, observation)
Decision Logic:
terminated → natural termination
step_type = LAST
discount = 0.0
truncated → artificial termination (time limit)
step_type = LAST
discount = 1.0 (or configurable γ)
otherwise → mid-episode
step_type = MID
discount = 1.0
-
Seeding
Gymnasium enforces strict seeding via reset(seed=...).
Acme environments often expose a global seed setter.
The wrapper must:
Bridge Acme-style seeding to Gymnasium’s reset(seed=...)
Ensure full reproducibility
🧠 Implementation Strategy
Initialization (init)
Store wrapped environment
Convert:
observation_spec
action_spec
reward_spec
Initialize internal episode state (_reset_next_step)
Accept optional discount override for infinite-horizon tasks
Reset (reset)
Call env.reset(seed=...)
Capture (obs, info)
Set _reset_next_step = False
Return dm_env.restart(obs)
Step (step)
If _reset_next_step is True, call reset()
Call env.step(action)
Unpack (obs, reward, terminated, truncated, info)
Apply termination logic
Return dm_env.TimeStep(...)
Acme currently lacks first-class support for Gymnasium environments (the successor to OpenAI Gym). This issue proposes implementing a new wrapper (acme/wrappers/gymnasium_wrapper.py) or significantly refactoring the existing gym_wrapper.py to correctly support Gymnasium’s updated APIs, semantics, and space definitions.
This change is required for compatibility with modern RL environments and reproducible experimentation.
📌 Motivation
Gymnasium has breaking API changes compared to legacy Gym (reset/step semantics, seeding, spaces).
Many Acme users now rely on Gymnasium-based environments.
A robust wrapper enables seamless interoperability between Gymnasium → dm_env → Acme.
Dependency Management
Gymnasium must be imported safely (guarded import).
The wrapper should not break Acme installations where Gymnasium is not installed.
Gymnasium should be treated as an optional dependency.
Space Conversion
Acme uses dm_env.specs, while Gymnasium uses gymnasium.spaces.
Reset Semantics
Gymnasium reset() returns (observation, info).
Legacy Gym returned only observation.
The wrapper should:
Capture (obs, info)
Optionally store info in TimeStep.extras
Return dm_env.restart(obs)
Step Semantics
Gymnasium step() returns:
(obs, reward, terminated, truncated, info)
This must be mapped to:
dm_env.TimeStep(step_type, reward, discount, observation)
Decision Logic:
terminated → natural termination
step_type = LAST
discount = 0.0
truncated → artificial termination (time limit)
step_type = LAST
discount = 1.0 (or configurable γ)
otherwise → mid-episode
step_type = MID
discount = 1.0
Seeding
Gymnasium enforces strict seeding via reset(seed=...).
Acme environments often expose a global seed setter.
The wrapper must:
Bridge Acme-style seeding to Gymnasium’s reset(seed=...)
Ensure full reproducibility
🧠 Implementation Strategy
Initialization (init)
Store wrapped environment
Convert:
observation_spec
action_spec
reward_spec
Initialize internal episode state (_reset_next_step)
Accept optional discount override for infinite-horizon tasks
Reset (reset)
Call env.reset(seed=...)
Capture (obs, info)
Set _reset_next_step = False
Return dm_env.restart(obs)
Step (step)
If _reset_next_step is True, call reset()
Call env.step(action)
Unpack (obs, reward, terminated, truncated, info)
Apply termination logic
Return dm_env.TimeStep(...)