Decouple backend-specific cloning from InteractiveScene#5770
Conversation
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot — Multi-Perspective Analysis
PR: Decouple backend-specific cloning from InteractiveScene
SHA: b12c47cc5d03d7896dad6b9600b57abf5f2b2da1
Architecture Assessment (Isaac Lab Expert)
This is an excellent structural improvement. The PR replaces InteractiveScene's hard-coded backend dispatching (if physx... elif newton... elif ovphysx...) with a self-registering queue pattern where each backend ships a <Backend>ReplicateContext + a queue_<backend>_replication() helper. The three orthogonal primitives (REPLICATION_QUEUE, ClonePlan, replicate()) compose well regardless of whether you're in InteractiveScene, a DirectRLEnv, or a standalone script.
Design wins:
- Clean separation of concerns —
InteractiveScenenow only cares about what to clone, not how - Backend extensibility without core modification (Open/Closed principle)
- Both invocation paths (
ReplicateSessioncontext manager vs. manualClonePlan.from_env_0+replicate) converge on the same drain function — no hidden divergence - The
replicate_priorityordering (physics=0, USD=100) ensures correct execution order
Findings
1. ✅ Performance Regression Risk: disabled_fabric_change_notifies Removed — ADDRESSED
disabled_fabric_change_notifies RemovedThe new UsdReplicateContext.replicate() method wraps copy work in disabled_fabric_change_notifies(self.stage) — performance optimization is preserved in the new architecture.
2. 🟡 Module-Level REPLICATION_QUEUE — Implicit Global State
REPLICATION_QUEUE is a module-level mutable list that asset constructors append to and replicate() drains. While the PR handles the "exception during session" case by clearing the queue in ReplicateSession.__exit__, there are still scenarios where the queue could accumulate stale entries:
- If
replicate()is called without aReplicateSessionand a backend raises mid-drain, the queue is already cleared up front (good), but the partial work from earlier backends cannot be rolled back. - If user code calls
queue_<backend>_replication(cfg)outside of any session/replicate cycle (e.g., in a test or standalone script that forgets to callreplicate()), entries accumulate silently until the nextreplicate()call — possibly in a completely unrelated context.
Suggestion: Consider adding a warnings.warn() or debug log in replicate() when queued contains entries whose cfg is not in plan.cfg_rows (currently silently skipped). This would help developers catch mismatched queue/plan scenarios.
3. ✅ ClonePlan.from_env_0 Couples to Live Queue State — ADDRESSED
ClonePlan.from_env_0 Couples to Live Queue StateThe docstring now documents the ordering constraint: "Must be called after all asset constructors have run, so their replication entries are already in the queue."
4. 🟢 _collect_asset_cfgs + _add_entities_from_cfg — Clear Flow
The scene now resolves {ENV_REGEX_NS} macros in _collect_asset_cfgs() before planning, and _add_entities_from_cfg() uses the already-resolved paths. The redundant double-format concern from earlier is no longer applicable.
5. 🟡 SimulationContext.instance() Crash in replicate()
In replicate(), the final line calls SimulationContext.instance().set_clone_plan(plan). If SimulationContext hasn't been initialized yet (e.g., in a unit test that manually creates a stage), .instance() returns None and this line raises AttributeError: 'NoneType' object has no attribute 'set_clone_plan'.
The ClonePlan.from_env_0 + replicate pattern in direct envs assumes a SimulationContext exists, but the function has no guard.
Suggestion: Add a defensive check: sim = SimulationContext.instance(); if sim is not None: sim.set_clone_plan(plan) — or document that replicate() requires an active SimulationContext.
6. 🟢 Strong Test Coverage for New Primitives
The test additions are solid:
test_usd_replicate_context_queue_and_replicate— validates the newUsdReplicateContexttest_queue_usd_replication_only_appends— confirms queue-only semanticstest_make_clone_plan_homogeneous_returns_env_root_plan— validates the fast-pathtest_make_clone_plan_heterogeneous_mutates_spawn_paths— validates multi-variant behaviortest_replicate_dedupes_shared_rows_across_cfgs— validates dedup logic_drain_replication_queuefixture prevents test cross-contamination
Gap: No integration test verifies the full ReplicateSession context-manager flow (enter → construct assets → exit → verify plan published). The unit tests cover individual primitives but not the composed lifecycle that InteractiveScene relies on. Consider adding a test that exercises ReplicateSession end-to-end with a mock backend.
Summary
| Category | Rating |
|---|---|
| Architecture | ✅ Excellent — clean decoupling, extensible backends |
| Error Handling | |
| Performance | ✅ Fabric notice suspension restored |
| Test Coverage | ✅ Good unit tests; integration test for composed lifecycle would strengthen |
| Migration Safety | ✅ Good — direct-env tasks migrated consistently |
Verdict: Strong architectural improvement ready for merge. The findings above are advisory — the design is sound and the implementation is clean.
Change History
Update (28b9473): Reviewed incremental changes from 322ec99a → 28b94730. This update rebases onto develop absorbing several merged PRs:
Upstream changes absorbed:
- #5928:
isaaclab_visualizerschangelog infrastructure — addedconfig/extension.tomlanddocs/CHANGELOG.rst - CI auto version bump: isaaclab 6.2.0 → 6.2.1, newton 0.14.0 → 0.14.1, ovphysx 3.0.0 → 3.0.1, physx 1.1.1 → 1.1.2, tasks 2.0.0 → 2.0.1, visualizers 0.1.0 → 0.1.1
- #5917 (TacSL update): PhysX config type migrations, deprecated wrench application calls, compliant-contact USD spawning fix
- #5926 (Newton SDF collision):
NewtonSDFCollisionPropertiesCfgfor SDF/hydroelastic collision attributes - #5930 (Cartpole refactor): Task consolidation,
permuteargument formdp.image()observation
PR-specific changes (replication-session-redesign):
- ✅ Changelog fragments added for the cloner refactor (
replication-session-redesign.minor.rst) - ✅ Skip fragments added for backend packages (newton, ovphysx, physx, tasks) indicating no user-facing changes in those packages from this PR
- ✅ Final squash commit titled "single commit" consolidates all PR work
Previous findings status:
- 🟡 Finding #2 (silent queue skipping) — unchanged, advisory
- 🟡 Finding #5 (SimulationContext guard in
replicate()) — unchanged, advisory
New observations:
- ✅ The
cartpole-image-permutefeature adds apermuteargument tomdp.image()for channel-first layout — clean addition, orthogonal to replication refactor - ✅
NewtonSDFCollisionPropertiesCfgshim exports added toisaaclab.simandisaaclab.sim.schemas— correctly wired through lazy exports - ✅ TacSL demo uses
PhysxRigidBodyMaterialCfginstead of deprecated aliases — aligns with the PhysX config consolidation
No new issues introduced. The PR remains ready for merge with the same advisory findings.
Previous updates:
- 322ec99: Rebase absorbing lazy pxr imports, resample_interval_on_reset, ProxyArray, task module renames, version bumps
- d3a2eab: Sensor USD replication queueing, OvPhysx envIdInBoundsBitCount authoring
- b12c47c: Squashed force-push with complete implementation
Update (463a05a): Reviewed incremental changes from 28b94730 → 463a05ae.
Changes in Newton replication (replicate.py):
- ✅
_build_newton_builder_from_mappingreturn type extended to 4-tuple, addingworld_xformslist of absolute per-world transforms - ✅
_cl_inject_sitesnow returnsworld_sites(bodyless sites needing per-world placement) - ✅ New per-world bodyless site loop correctly uses
wp.transform_multiply(world_xform, xform)to place sites in each env's global frame - ✅ Clear separation between
env_xform(delta-based, foradd_builder) andworld_xform(absolute, for sites and frame placement) - ✅
NewtonManager._world_xformsstored on commit — consumed byFrameViewfor camera/non-physics frame placement
Assessment: Clean, well-documented extension. The dual-transform pattern (env_xform vs world_xform) is correctly reasoned about in comments. No new issues introduced.
Previous findings status:
- 🟡 Finding #2 (silent queue skipping) — unchanged, advisory
- 🟡 Finding #5 (SimulationContext guard) — unchanged, advisory
Update (6f34c5a): Reviewed incremental changes from 463a05ae → 6f34c5ac.
This is a large consolidation commit that completes the replication session redesign. Key changes:
Core API changes:
- ✅
InteractiveScene.clone_environments()removed — replaced by explicitcloner.replicate(plan, stage=...)in all direct envs - ✅
InteractiveScene.env_ns/env_regex_nsproperties removed —cloner_cfg.clone_regexis now the canonical source - ✅
_build_clone_plan_from_cfg()replaced by simpler_collect_asset_cfgs()— returns resolved cfg list without planning logic - ✅
env_originsproperty now reads fromclone_plan.positionsinstead of internal_default_env_pose - ✅
make_clone_plan()API changed: takescfgslist +num_clones/env_spacinginstead of raw source/destination lists
Sensor replication queueing:
- ✅
Camera.__init__andBaseRayCaster.__init__now callqueue_usd_replication(self._source_cfg) - ✅
SensorBasestores original cfg as_source_cfgbefore copy for replication registration - ✅ Sensor
_initialize_implhandles Newton solver-side cloning (where USD only has env_0 prototype)
Backend replication contexts:
- ✅ Newton:
newton_replicate.py→replicate.py,NewtonReplicateContextfully replacesnewton_visualizer_prebuild - ✅ OvPhysX:
ovphysx_replicate.py→replicate.py,OvPhysxReplicateContextwraps the pending-clone registration - ✅ PhysX:
physx_replicate.py→replicate.py,PhysxReplicateContextencapsulates the replicator interface - ✅ All three backends now append
(cfg, ContextClass)toREPLICATION_QUEUEfrom asset constructors
Lazy import guards (isaaclab_experimental):
- ✅
envs/__init__.py,managers/__init__.py, and MDP action/term packages now uselazy_export()+.pyistubs - ✅ Runtime types (
Articulation,InteractiveScene,ContactSensor) moved behindTYPE_CHECKING - ✅
actions_cfg.pyuses stringclass_typereferences ("{DIR}.joint_actions:JointPositionAction") for deferred resolution
Task migrations:
- ✅ All 15+ direct envs consistently migrated to
cloner.grid_transforms()+ClonePlan.from_env_0()+cloner.replicate() - ✅ Direct task templates updated for new API
- ✅ Cartpole/Ant/Humanoid experimental tasks split env cfg into separate
_cfg.pyfiles to avoid eager backend imports
Test coverage:
- ✅ Comprehensive new tests for
UsdReplicateContext,queue_usd_replication,make_clone_plan(cfg-driven),ClonePlan.from_env_0,replicate(drain/dedup/priority/failure/skip) - ✅
ReplicateSessioncleanup-on-failure test added - ✅
resolve_clone_plan_sourcetests for nested/ambiguous/partial coverage - ✅ Old
clone_environments-based tests properly replaced with new_collect_asset_cfgsandreplicate-based tests
Minor cleanups:
- Module-level imports in
scene_data_provider.py,schemas_actuators.py,terrains/utils.py(removed deferred imports) clone_in_fabricdocstring updated to mark as deprecated- Test sensor files pass
env_ids/positions/cfg_rowstoClonePlanconstructors
Previous findings status:
- 🟡 Finding #2 (silent queue skipping) — unchanged, advisory
- 🟡 Finding #5 (SimulationContext guard in
replicate()) — unchanged, advisory
New observation:
- 🟡 The
generate_synthetic_gaussian_asset.pytest script manually constructs aClonePlan.from_env_0()+cloner.replicate()after the scene'sReplicateSessionhas already exited, because the camera is constructed late. This "Path B" pattern works but introduces a coupling to the scene's internal env layout. The inline comment documents this clearly, so it's acceptable.
Verdict: This commit delivers the complete migration away from InteractiveScene.clone_environments(). The architecture is clean, the lazy-import strategy prevents forbidden-import regressions, and the test coverage is thorough. Ready for merge.
fb61d98 to
090cdcc
Compare
|
Update (090cdcc): Summary of Incremental ChangesThis update contains significant refactoring across multiple files: 1.
2.
3.
4. New
5. New
6.
7. Backend Modules Updated:
8. Direct Envs Updated (
9. Test Coverage Expanded:
Assessment✅ No new issues introduced. The changes are consistent with the original PR design:
The removal of |
| hundreds, sometimes tens of thousands per GPU — and authoring each of those envs | ||
| by hand would be hopelessly slow. Cloning is Isaac Lab's answer: you author a | ||
| small representative scene under ``/World/envs/env_n`` and the cloner expands it | ||
| across the rest of the env population for you, optionally with per-env variation. |
There was a problem hiding this comment.
Might be worth adding the wording "heterogenous" when discussing per-env cloning variations (and maybe some details on how that work?)
There was a problem hiding this comment.
I dont know, I think I want to try avoid using the world heterogenous/homogeneous. Newton doesn't use it at all
| The plan is stage-agnostic by design — the same instance can be replayed against a | ||
| different stage, inspected by tooling, or serialized. | ||
|
|
||
| When every env is a copy of env_0: |
There was a problem hiding this comment.
Use the terminology "homogenous" here and other places where possible.
There was a problem hiding this comment.
for the same reason as mentioned above, but happy to be convinced otherwise
| ``clone_mask[i, j]`` is ``True`` when environment ``j`` should receive source row ``i``. | ||
| The same plan can be passed to USD replication, physics replication, and scene-data | ||
| providers. | ||
| When envs differ — say a cartpole in every env plus a 2-variant obstacle (box into |
There was a problem hiding this comment.
This reads a little bit confusing. Maybe re-write to something like:
"We can build heterogenous envs through ..., as an example we can build clone a cartpole environment that has two variants, half the environments use a box obstacle, the other half use sphere"
There was a problem hiding this comment.
this is a good suggestion Ill try to incorporate that.
| in the same ``cloner.replicate(plan, stage=...)`` call, so the choice between | ||
| them is purely about ergonomics: | ||
|
|
||
| * The first wraps both phases in a context manager and is what |
There was a problem hiding this comment.
Seems duplicated since below is the breakdown for all three, we can just remove these bullet points and maybe list the 3 replication methods, and then jump into below.
| of env_0. Reach for it in :class:`~isaaclab.envs.DirectRLEnv` and standalone | ||
| scripts that hand-build the env-0 prototype prim by prim. | ||
|
|
||
| ``ReplicateSession`` |
There was a problem hiding this comment.
We should use 1. 2. 3. for each of the methods, so that users can follow easily
|
|
||
| * :doc:`multi_asset_spawning` -- configuring multi-asset and multi-USD spawners. | ||
| * :doc:`optimize_stage_creation` -- Fabric cloning and stage-in-memory optimizations. | ||
| Newton isolates envs through its world system and does not need this pass. |
There was a problem hiding this comment.
Newton envs are inherently isolated, they don't truly live in the same world, since state is batched.
| device: Torch device for the mask and env id buffers. | ||
| positions: Optional per-env world positions [m], shape ``[num_clones, 3]``. | ||
| """ | ||
| from .replicate_session import REPLICATION_QUEUE # noqa: PLC0415 |
There was a problem hiding this comment.
some ai sloppyness...
| positions=plan.positions, | ||
| ) | ||
|
|
||
| for ctx in sorted(backend_ctxs.values(), key=lambda c: getattr(c, "replicate_priority", 0)): |
There was a problem hiding this comment.
For env cloning, why is the sorting important?
There was a problem hiding this comment.
in the case there is usd_cloning, its usually cloned at last to avoid other engine reads the entire scene.
| ) | ||
| self.cloner_cfg = cloner.CloneCfg(device=self.device) | ||
| env_root = self.cloner_cfg.clone_regex.rsplit("/", 1)[0] | ||
| self.env_prim_paths = [f"{env_root}/env_{i}" for i in range(self.cfg.num_envs)] |
There was a problem hiding this comment.
Is this still needed? Does not seem used anymore since clone plan uses the {} pattern
| spawn_ground_plane(prim_path="/World/ground", cfg=GroundPlaneCfg()) | ||
| # clone and replicate | ||
| self.scene.clone_environments(copy_from_source=False) | ||
| src, dest = "/World/envs/env_0", "/World/envs/env_{}" |
There was a problem hiding this comment.
Should we create a helper called cloner.homogenous_cloning_from_env_0(...) that makes these 4 lines into 1 line? Or even just cloner.homogenous_cloning
There was a problem hiding this comment.
I still want to make string explicit enough that user sees key information from the, I have implemented cloner.homogenous_cloning_from_env_0(...), but ditched this option because it feels a bit too magical.
322ec99 to
28b9473
Compare
463a05a to
78f998b
Compare
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot — Update Review
SHA: 6f34c5acdbca989859a276e9339f51f23fe463b3
Previous SHA: 463a05aeb77f01c37831b4847d89d46eec596a88
Changes Since Last Review
Upstream changes absorbed:
- #5916: Warp-first experimental envs lazy export + cfg implementation split — splits env config classes (
*EnvCfg) into separate*_cfg.pymodules to avoid eager imports of backend-dependent types (Articulation,InteractiveScene,ContactSensor, action terms) at config-load time. The runtime env classes import the cfg at use time. This fixes the crash where loading a warp task's env config beforeSimulationAppwas launched would fail on forbidden imports. - #5929: Fix clone-plan source resolution for nested destination templates —
resolve_clone_plan_source()now correctly handles paths under nested templates (e.g.,/World/envs/env_.*/Robot/ee_link/Camera) by selecting the most specific (longest-matching) template.
PR-specific changes:
- ✅
78f998ba— fix replicate logic: FixedNewtonReplicateContext.replicate()to propagateworld_xformstoNewtonManager._world_xforms. The_build_newton_builder_from_mapping()function now correctly returns a 4-tuple includingworld_xforms, and the caller stores this onNewtonManagerwhencommit_to_manager=True. This ensuresFrameViewcan place non-physics frames (cameras) in the correct per-env locations. - ✅
6f34c5ac— lint: Import sorting in test file (cosmetic).
Assessment
The Newton replication fix (78f998ba) is critical — without it, cameras/frames under Newton-managed articulations would be placed incorrectly. The previous 463a05ae commit introduced world_xforms computation but failed to propagate it to NewtonManager._world_xforms. This commit completes that work:
# Before (463a05ae): world_xforms computed but lost
builder, stage_info, site_index_map = _build_newton_builder_from_mapping(...)
# After (78f998ba): world_xforms stored on manager
builder, stage_info, site_index_map, world_xforms = _build_newton_builder_from_mapping(...)
if self.commit_to_manager:
NewtonManager._world_xforms = world_xformsThe fix is clean and follows the established pattern.
Previous Findings Status
| Finding | Status |
|---|---|
| #1 Performance (fabric notices) | ✅ Addressed |
| #2 Silent queue skipping | 🟡 Advisory — unchanged |
#3 from_env_0 ordering |
✅ Addressed |
#4 _collect_asset_cfgs flow |
🟢 Clear |
| #5 SimulationContext guard | 🟡 Advisory — unchanged |
| #6 Test coverage | 🟢 Good |
No new issues introduced. The PR remains ready for merge with the same advisory findings from the initial review.
Verdict: ✅ Ready for merge
📝 Update — 3579870e (2026-06-03)
New commit: Makes ClonePlan fields (env_ids, positions, cfg_rows) optional with sensible defaults (None, None, field(default_factory=dict)).
Rationale: Allows lightweight ClonePlan construction for callers that only need iter_clone_plan_matches or resolve_clone_plan_source without providing the full set of replication fields. The docstring correctly documents that env_ids is still required for replicate().
Assessment: ✅ Clean change. Uses field(default_factory=dict) correctly for the mutable cfg_rows default. No new issues.
Summary
This PR removes backend-specific cloning logic from
InteractiveSceneandreplaces the previous implicit
replicate_session_defaults/replicate_sessionmachinery with three orthogonal, explicit primitives thatcompose identically whether you drive cloning through
InteractiveSceneorby hand in a
DirectRLEnvor standalone script.The result is that
InteractiveSceneno longer knows anything about USD vs.PhysX vs. Newton — it just enters a
ReplicateSessionand lets each asset'sconstructor register the backend(s) it needs. This is foundational for two
follow-on capabilities the project has wanted for a while: flexible backend
cloning and skip-cloning workflows.
What changed
New core primitives in
isaaclab.clonerREPLICATION_QUEUE— module-level list that asset constructors append(cfg, BackendCtxCls)pairs to via tiny per-backend helpers(
queue_usd_replication,queue_physx_replication,queue_newton_replication). Backends are no longer special-cased insideInteractiveScene; each one self-registers.ClonePlan— self-contained dataclass describing the world layout(
sources,destinations,clone_mask,env_ids,positions,cfg_rows). Stage-agnostic by design; the USD stage is now passedexplicitly to consumers so the same plan can be replayed, inspected, or
serialized.
replicate(plan, *, stage)— free function that drainsREPLICATION_QUEUEagainst a plan, groups queued cfgs by backend contextclass, runs each context in ascending
replicate_priorityorder (physicsbefore USD), publishes the plan to
SimulationContext, and clears thequeue. The queue is snapshotted and cleared up front so a backend failure
cannot leak stale entries into the next call.
ReplicateSessionis now a thin context manager that callsmake_clone_planin__enter__andreplicatein__exit__. Thestate-bag version with
plan/stage/cfg_rows/replicate_on_exitfields is gone.ClonePlan.from_env_0— classmethod that builds the single-sourcehomogeneous plan most direct envs need by auto-populating
cfg_rowsfrom
REPLICATION_QUEUEfiltered by env-root prefix.CloneCfg.clone_regex(default"/World/envs/env_.*") — singlesource of truth for the env-namespace convention.
InteractiveScenereads it directly when expanding
{ENV_REGEX_NS}cfg macros.Two equivalent invocation paths
Both end in the same
cloner.replicate(plan, stage=...)call. The onlydifference is how the plan was built and how asset construction was
interleaved.
What got removed from
InteractiveSceneclone_environments(...)deprecated shim. The scene now replicatesinside
__init__viaReplicateSession.env_ns/env_regex_nsproperties (used only internally)._build_clone_plan_from_cfgand_default_env_originsinternals.Cfg-driven plan construction now lives in
make_clone_plan; per-envpositions are read from the published
ClonePlan.InteractiveScene.env_originsnow reads from the plan published toSimulationContext, making the plan the single source of truth forenv placement.
Why this matters (the actual point of the PR)
This refactor is foundational for two capabilities the current scene
coupling blocks:
<Backend>ReplicateContextclass + a one-line queue helper. SwappingPhysX ↔ Newton no longer requires
InteractiveSceneto change; cfgsand user code stay untouched, and a third-party backend can register
itself without modifying core.
registration, and drain are three independent primitives, callers
that want to author env-0 prims by hand and skip the cloner — or
drive replication out-of-band from a visualizer, replay tool, or
test fixture — can do so without fighting
InteractiveScene.Migration notes
with cloner.ReplicateSession():(no-arg) →cloner.replicate(cloner.ClonePlan.from_env_0(...), stage=...).InteractiveScene.clone_environments(...)→ removed; the scenereplicates inside
__init__.make_clone_plan(sources, destinations, ...)→make_clone_plan(cfgs, num_clones, env_spacing, device, ...).stage=...explicitly toreplicate()andReplicateSession().CloneCfg.clone_regexif you previously usedInteractiveScene.env_ns/env_regex_ns.About 17 direct envs were migrated to the new pattern in this PR.
Test plan
./isaaclab.sh -p -m pytest source/isaaclab/test/sim/test_cloner.py./isaaclab.sh -p -m pytest source/isaaclab/test/scene/test_interactive_scene.py./isaaclab.sh -p -m pytest source/isaaclab_physx/test/sim/test_cloner.pycartpole,anymal_c,franka_cabinet,factory,humanoid_amp,inhand_manipulation,locomotion,quadcopter,shadow_hand_*,automate/*,cart_double_pendulum,cartpole_warp,inhand_manipulation_warp,locomotion_warp)spawn and step on PhysX
./isaaclab.sh -fpassescd docs && make html)