Seun/nist gear insertion task example by seun-doherty · Pull Request #567 · isaac-sim/IsaacLab-Arena

seun-doherty · 2026-04-09T23:37:56Z

Summary

Add a complete NIST gear insertion RL workflow: a Franka Panda robot inserts a medium gear onto a peg on the NIST assembly board using operational-space control and RL Games PPO
Include step-by-step documentation (environment setup, policy training, evaluation) mirroring the existing Franka lift task pages
Add RL Games policy wrapper, training script, and YAML config for LSTM-based asymmetric actor-critic

What's Included

Core task (isaaclab_arena/tasks/): task definition, keypoint-squashing rewards, 24-D policy observations with wrist-force feedback, insertion success / gear-drop terminations, domain randomization events

Environment (isaaclab_arena_environments/): OSC action term (asset-relative, EMA smoothing, dead-zones), Franka mimic OSC robot config, environment definition wiring scene + embodiment + task + RL Games

RL infrastructure (isaaclab_arena/policy/, scripts/): RL Games action policy wrapper, training script, PPO hyperparameter config

Documentation (docs/pages/example_workflows/nist_gear_insertion/): 4-page workflow (overview, environment setup, training, evaluation) with GIFs

Asset registry (isaaclab_arena/assets/object_library.py): NIST board, gear, peg, and connector asset definitions

NOTE: Object library paths for added assets will be updated when assets are uploaded to nucleus waiting on @alexmillane

NIST peg-insert gear assembly environment: custom OSC action term, keypoint squashing rewards, IK gripper reset, and domain randomization. Includes RL-Games PPO config, train/play scripts, and policy runner wrapper.

Isaac Lab 3.0 changed from wxyz to xyzw quaternion ordering. This caused the robot to spawn upside down, leading to IK solver failure, NaN observations, and training crashes. Key fixes: - Robot init rotation: (1,0,0,0) -> (0,0,0,0,1) in robot_configs.py - Grasp rotation offset: wxyz -> xyzw in environment config - Quaternion canonicalization: check w at index 3 (not 0) everywhere - Replace torch_utils (wxyz) with math_utils (xyzw) in OSC action - Wrap all warp arrays with wp.to_torch() for PyTorch compatibility - Update deprecated IsaacLab API calls to _index variants - Increase gpu_collision_stack_size to 4 GB for contact-heavy scenes

- Consolidate 3 observation files into single gear_insertion_observations.py - Replace 4 custom obs functions with Isaac Lab built-ins (root_pos_w, root_quat_w) - Slim NistGearInsertionTask constructor (40 → 17 params) via GraspConfig dataclass - Hardcode reward weights in configclass instead of passing through constructor - Delete bespoke play_rl_games.py; use generic policy_runner.py for evaluation - Genericise RlGamesActionPolicy (remove NIST-specific defaults) - Move RL-Games YAML config to isaaclab_arena_examples/policy/ - Clean up mdp/__init__.py re-exports - Add merge readiness report and NIST vs Lift comparison doc

- Keep success term registered (returns all-False during training) so SuccessRateMetric can query it, matching Lift task pattern - Loosen success_z_fraction from 0.05 to 0.15 (3mm depth threshold) - Add new NIST asset definitions to object_library

Add step-by-step documentation for the NIST gear insertion RL workflow (environment setup, policy training, evaluation) mirroring the existing Franka lift task pages. Include task overview GIFs, register the new workflow in the RL workflows index, and clamp NaN/inf values in force-torque observations to prevent training instabilities.

greptile-apps · 2026-04-09T23:44:05Z

Greptile Summary

This PR adds a complete NIST gear-insertion RL workflow for the Franka Panda robot, including an OSC action term, 24-D policy observations, keypoint-squashing rewards, reset-time IK grasp placement, an RL-Games policy wrapper, and four-page documentation.

Core task (tasks/nist_gear_insertion/): geometry utilities, keypoint rewards, insertion-success / drop terminations, and domain-randomization events wire together through a NistGearInsertionRLTask config class.
OSC action term (mdp/nist_gear_insertion/osc_action.py): ports the Forge peg-relative position target, 270\u00b0 yaw remap, EMA smoothing, and dead-zone clipping into the manager-based API; yaw dead-zone and clamping issues from prior rounds appear to be addressed.
RL-Games wrapper (policy/rl_games_action_policy.py): handles checkpoint restoration, RNN state resets per env-ids, and a shim that avoids phantom resets during inference.

Confidence Score: 3/5

The task can be run, but domain randomization of joint actuator gains and friction will silently do nothing because joint_names receives a bare string in both SceneEntityCfg calls, eliminating a key source of training variation without any error or warning.

The joint_names=arm_joints calls in the randomization event setup pass a raw Python string where a list is expected; Isaac Lab will iterate over its characters, find zero matching joints, and silently no-op both the actuator-gain and joint-friction randomization that the PR explicitly includes for robustness. The rest of the pipeline looks structurally sound, with several issues from prior review rounds addressed in this refactor.

isaaclab_arena/tasks/nist_gear_insertion/task.py (randomization event setup, lines 220-238)

Important Files Changed

Filename	Overview
isaaclab_arena/tasks/nist_gear_insertion/task.py	Core task config: geometry, observations, rewards, terminations, and domain randomization. The randomization events pass joint_names as a bare string which silently skips all joint actuator and friction randomization.
isaaclab_arena_environments/mdp/nist_gear_insertion/osc_action.py	OSC action term with EMA filtering, peg-relative position target, 270-degree yaw remap, dead-zone clipping, and reset-time seeding. Yaw dead-zone handling is addressed via clamp.
isaaclab_arena_environments/mdp/nist_gear_insertion/osc_observations.py	24-D policy observation term: peg-relative fingertip pose, velocity computed from unit quaternions before zeroing z/w, smoothed wrist force, and previous actions.
isaaclab_arena_environments/mdp/nist_gear_insertion/franka_osc_cfg.py	Environment-level action and observation configs. All sensor noise levels are hardcoded to 0.0, training without sensor noise.
isaaclab_arena/tasks/nist_gear_insertion/events.py	Reset event: IK-based gripper placement, DLS IK loop with convergence check, and gripper width open/close sequence.
isaaclab_arena/policy/rl_games_action_policy.py	RL-Games inference wrapper: env-wrapper shim to skip phantom resets, checkpoint restoration, RNN state management per env_ids.
isaaclab_arena_environments/nist_assembled_gearmesh_osc_environment.py	Environment wiring: assembles scene, task, embodiment, and MDP configs. All four NIST assets are added to the scene and correctly referenced in the task.

Sequence Diagram

sequenceDiagram
    participant Env as IsaacLabArenaEnvironment
    participant Task as NistGearInsertionRLTask
    participant Action as NistGearInsertionOscAction
    participant Obs as NistGearInsertionPolicyObservations
    participant Reward as gear_peg_keypoint_squashing
    participant OSC as OperationalSpaceController

    Env->>Task: reset() arrow place_gear_in_gripper
    Env->>Action: reset() arrow seed smoothed_actions from EE pose
    Env->>Obs: reset() arrow store prev_pos / prev_quat

    loop Every step
        Env->>Action: process_actions(policy_out)
        Action->>Action: EMA filter, peg-relative pos, yaw remap
        Action->>OSC: set_command(pose_abs)
        OSC-->>Env: joint torques
        Env->>Obs: __call__() arrow 24-D policy obs
        Obs->>Action: read smoothed_force, prev_actions
        Env->>Reward: __call__() arrow keypoint bonus
    end

_{Reviews (8): Last reviewed commit: "Align NIST gear insertion success geomet..." | Re-trigger Greptile}

greptile-apps · 2026-04-09T23:44:08Z

+        noisy_quat = math_utils.quat_mul(
+            ft_quat,
+            math_utils.quat_from_angle_axis(rot_noise_angle, rot_noise_axis),
+        )
+        noisy_quat[:, [2, 3]] = 0.0
+        noisy_quat = noisy_quat * self._flip_quats.unsqueeze(-1)
+
+        arm_osc_action = env.action_manager._terms["arm_action"]
+        board_pos = wp.to_torch(board.data.root_pos_w) - origins
+        board_quat = wp.to_torch(board.data.root_quat_w)
+        peg_offset_exp = self._peg_offset.unsqueeze(0).expand(n, 3)
+        peg_pos = board_pos + math_utils.quat_apply(board_quat, peg_offset_exp)
+        noisy_fixed_pos = peg_pos + arm_osc_action.fixed_pos_noise
+
+        fingertip_pos_rel = noisy_pos - noisy_fixed_pos
+
+        safe_dt = max(dt, 1e-6)
+        ee_linvel = (noisy_pos - self._prev_noisy_pos) / safe_dt
+        self._prev_noisy_pos[:] = noisy_pos
+
+        rot_diff = math_utils.quat_mul(noisy_quat, math_utils.quat_conjugate(self._prev_noisy_quat))
+        rot_diff = rot_diff * torch.sign(rot_diff[:, 3]).unsqueeze(-1)
+        ee_angvel = axis_angle_from_quat(rot_diff) / safe_dt
+        ee_angvel[:, 0:2] = 0.0
+        self._prev_noisy_quat[:] = noisy_quat


Non-unit quaternion corrupts ee_angvel computation

noisy_quat[:, [2, 3]] = 0.0 zeroes the z-imaginary and w-real components of the xyzw quaternion, producing [x, y, 0, 0] whose norm is sqrt(x²+y²) ≠ 1 (and is exactly 0 at the identity rotation where x=y=0). This non-unit quaternion is then fed directly into quat_mul / axis_angle_from_quat to estimate ee_angvel, which requires unit quaternions. The angular velocity signal included in the 24-D policy observation will be numerically incorrect — and zero at the identity pose — which can silently degrade training.

The ee_angvel should be computed from the normalized quaternions before the zeroing step. The zeroed representation is only needed for the neural-network input. For example:

# 1. Compute angular velocity from unit quaternions first rot_diff_full = math_utils.quat_mul(ft_quat, math_utils.quat_conjugate(self._prev_noisy_quat_full)) rot_diff_full = rot_diff_full * torch.sign(rot_diff_full[:, 3]).unsqueeze(-1) ee_angvel = axis_angle_from_quat(rot_diff_full) / safe_dt # 2. Then zero z/w for the policy observation representation noisy_quat_for_obs = noisy_quat.clone() noisy_quat_for_obs[:, [2, 3]] = 0.0 noisy_quat_for_obs = noisy_quat_for_obs * self._flip_quats.unsqueeze(-1) self._prev_noisy_quat[:] = ft_quat # track full unit quaternion

greptile-apps · 2026-04-09T23:44:09Z

+        flip[torch.rand(n, device=dev) > 0.5] = -1.0
+        self._flip_quats[env_ids] = flip
+
+        self._ensure_body_indices()
+        robot: Articulation = self._env.scene[self._robot_name]
+        origins = self._env.scene.env_origins
+        self._prev_noisy_pos[env_ids] = (
+            wp.to_torch(robot.data.body_pos_w)[env_ids, self._fingertip_idx] - origins[env_ids]
+        )
+        self._prev_noisy_quat[env_ids] = wp.to_torch(robot.data.body_quat_w)[env_ids, self._fingertip_idx]
+


First reset mismatches unit vs. zeroed quaternion

In reset(), self._prev_noisy_quat[env_ids] is stored as the raw body quaternion (full unit quaternion). But in __call__, noisy_quat is zeroed at indices [2, 3] and then saved to self._prev_noisy_quat. So the very first step after every episode reset computes rot_diff from a full unit quaternion (prev) vs. a zeroed non-unit quaternion (current), causing a large spurious angular velocity spike at step 0 of each episode. The reset path should store the same representation that __call__ will write, or the angular velocity should be zeroed at the reset step.

greptile-apps · 2026-04-09T23:44:10Z

+    ``gear_insertion_success_bonus`` / ``gear_mesh_insertion_success``), not the gear
+    rigid-body root, consistent with common assembly peg-insert benchmarks.
+    """
+
+    def __init__(self, cfg: RewardTermCfg, env: ManagerBasedRLEnv):
+        super().__init__(cfg, env)
+        self._pred_scale = 0.0
+        self._delay_until_ratio: float = cfg.params.get("delay_until_ratio", 0.25)
+        self._gear_cfg: SceneEntityCfg = cfg.params["gear_cfg"]
+        self._board_cfg: SceneEntityCfg = cfg.params["board_cfg"]
+        hgo = cfg.params.get("held_gear_base_offset", [2.025e-2, 0.0, 0.0])


_pred_scale is never reset — curriculum latches permanently

self._pred_scale starts at 0.0 and is set to 1.0 once true_success.float().mean() >= self._delay_until_ratio. Because it is an instance-level float with no reset path, once activated it stays at 1.0 for the remainder of training — even if the policy later regresses below the threshold. Whether this irreversibility is intentional (one-way curriculum gate) should be confirmed; if performance can degrade after the switch, the penalty will still apply and may prevent recovery.

greptile-apps · 2026-04-09T23:44:11Z

+                    "asset_cfg": SceneEntityCfg("robot", joint_names=[arm_joints]),
+                    "stiffness_distribution_params": (0.75, 1.5),
+                    "damping_distribution_params": (0.3, 3.0),
+                    "operation": "scale",
+                    "distribution": "log_uniform",
+                },
+            )
+            cfg.robot_joint_friction = EventTermCfg(
+                func=mdp_isaac_lab.randomize_joint_parameters,
+                mode="reset",
+                params={
+                    "asset_cfg": SceneEntityCfg("robot", joint_names=[arm_joints]),


joint_names=[arm_joints] — regex string wrapped in a list

arm_joints is a regex string ("panda_joint.*" or "panda_joint[1-7]"), but it is passed as joint_names=[arm_joints] — a list containing one element. Whether SceneEntityCfg interprets each list element as a regex pattern or as a literal name depends on the Isaac Lab version. If treated as a literal name, it will match zero joints and silently produce no-ops for actuator-gain and joint-friction randomization. The same pattern appears at line 264. Verify the expected API; if a single regex is correct, pass it directly:

"asset_cfg": SceneEntityCfg("robot", joint_names=arm_joints),

greptile-apps · 2026-04-09T23:44:12Z

+    held_gear_base_offset: list[float] | None = None,
+    gear_peg_height: float = 0.02,
+    success_z_fraction: float = 0.80,
+    xy_threshold: float = 0.0025,
+    rl_training: bool = False,
+) -> torch.Tensor:
+    """Terminate when the gear is inserted onto the peg to the required depth.
+
+    Checks that the held gear's base (root + held_gear_base_offset in gear frame)
+    is centered on the peg (XY) and lowered past a fraction of the peg height (Z).
+    Peg position is fixed_asset pose + gear_base_offset in the fixed asset's local frame.
+
+    When ``rl_training`` is True, always returns False (no early termination)
+    but the term stays registered so that ``SuccessRateMetric`` can query it.
+    """
+    if rl_training:
+        return torch.zeros(env.num_envs, dtype=torch.bool, device=env.device)
+    held_object: RigidObject = env.scene[held_object_cfg.name]
+    fixed_object: RigidObject = env.scene[fixed_object_cfg.name]
+
+    held_root = wp.to_torch(held_object.data.root_pos_w) - env.scene.env_origins
+    held_quat = wp.to_torch(held_object.data.root_quat_w)
+    h_offset = held_gear_base_offset if held_gear_base_offset is not None else gear_base_offset
+    held_off = torch.tensor(h_offset, device=env.device, dtype=torch.float32).unsqueeze(0).expand(env.num_envs, 3)
+    held_base_pos = held_root + math_utils.quat_apply(held_quat, held_off)


gear_orientation_exceeded reads an attribute that is never set

env._initial_gear_ee_relative_quat is accessed inside the function body, but no event, task, or reset logic in this PR ever sets this attribute on the env. The hasattr guard makes the function silently return all-False (never triggers), so it is effectively dead code. If this termination is intended for future use, a TODO comment would clarify that; if it should be active now, the attribute needs to be populated — e.g., in the place_gear_in_gripper event.

greptile-apps · 2026-04-09T23:44:13Z

+
+    name = "gears_and_base"
+    tags = ["object"]
+    usd_path = str(_REPO_ROOT / "assets" / "gearbase_and_gears_gearbase_root.usd")


Asset USD paths reference local files that don't exist in the repo

All new NIST asset classes point to _REPO_ROOT / "assets" / "*.usd" paths that are not committed. The PR description notes these will be updated when assets are uploaded to Nucleus, but loading any of these asset classes will raise a file-not-found error at sim startup. Consider adding a clear NOTE comment on each class to make the stub status explicit.

kellyguo11

PR #567 Review: NIST Gear Insertion Task with RL-Games Training Pipeline

Overall Assessment

This is a substantial and well-structured PR that adds a complete RL workflow for contact-rich gear insertion on the NIST assembly board. The architecture follows existing Arena patterns (lift task), the documentation is thorough, and the commit history shows thoughtful iteration (quaternion migration, refactoring, observation hardening). Good work overall.

That said, there are several issues worth addressing before merge — ranging from correctness concerns to code quality and maintainability.

🔴 Issues (Should Fix)

1. Duplicate copyright headers in train_rl_games.py
Lines 1-5 and 7-8 both have copyright/SPDX headers. Remove the duplicate.

2. gpu_collision_stack_size=2**32 - 1 in env_callbacks.py is 4 GB — this affects ALL assembly environments, not just this task
This is set in the shared assembly_env_cfg_callback. Other tasks that import this callback will now request 4 GB collision stack. Consider:

Making it configurable per-environment, or
At minimum, adding a comment explaining why this value is needed (contact-heavy gear mesh scenes), so future maintainers know whether they can tune it down.

3. Mutable default arguments in function signatures
Multiple functions use mutable list defaults (peg_offset: list[float] = [0.0, 0.0, 0.0]). This is a well-known Python anti-pattern. Examples:

gear_insertion_rewards.py: gear_insertion_engagement_bonus, gear_insertion_success_bonus, _check_gear_position
gear_insertion_observations.py: peg_pos_in_env_frame, peg_delta_from_held_gear_base, held_gear_base_pos_in_env_frame
terminations.py: gear_mesh_insertion_success, gear_dropped_from_gripper

While these are typically called by the manager framework (which passes explicit args), it's still a latent bug. Use tuple or None with a default factory pattern.

4. NistAssembledBoard vs NistBoardAssembled in object_library.py
Two very similarly named asset classes with different name fields ("nist_assembled_board" vs "nist_board_assembled"). The environment only uses "nist_board_assembled". Is NistAssembledBoard actually needed? If not, remove it to avoid confusion. If both are needed, add docstrings explaining the distinction.

5. Missing __init__.py for observations and rewards subdirectories
isaaclab_arena/tasks/observations/gear_insertion_observations.py and isaaclab_arena/tasks/rewards/gear_insertion_rewards.py are new files in subdirectories. Are there __init__.py files already? The diff doesn't show them being created, but imports in nist_gear_insertion_task.py use direct module paths, so this may be fine if the parent __init__.py already exists. Worth verifying.

6. NistGearInsertionPolicyObservations accesses private action_manager._terms
In gear_insertion_observations.py line ~230: arm_osc_action = env.action_manager._terms["arm_action"]. Accessing _terms (private dict) is fragile. If there's a public API for this (like env.action_manager.get_term("arm_action")), use it. Same pattern appears in gear_insertion_rewards.py (osc_action_magnitude_penalty, osc_action_delta_penalty, wrist_contact_force_penalty, success_prediction_error).

7. force_torque_at_body uses robot.root_view.get_link_incoming_joint_force() — undocumented API
get_link_incoming_joint_force() is a PhysX-level call via root_view. This couples the code to specific Isaac Sim internals. Same pattern in NistGearInsertionPolicyObservations.__call__. Add a comment explaining this dependency.

🟡 Suggestions (Nice to Have)

8. Line length violations
Several lines exceed reasonable width (>120 chars), making diffs harder to review:

gear_insertion_observations.py line 60 (held_gear_base_pos_in_env_frame)
terminations.py line ~298 (tensor creation in gear_mesh_insertion_success)
gear_insertion_rewards.py line ~143

9. Duplicated geometry checks across rewards, terminations, and observations
_check_gear_position (rewards), gear_mesh_insertion_success (terminations), and the success prediction logic all compute the same peg-vs-held-gear-base geometry independently. Consider extracting a shared utility function to reduce drift risk.

10. grasp_rot_offset default [1.0, 0.0, 0.0, 0.0] in GraspConfig — this is xyzw format with w=0
[1.0, 0.0, 0.0, 0.0] — if this is xyzw convention, then w=0 which is not a valid quaternion. The environment overrides it with [1.0, 0.0, 0.0, 0.0] too. Given the Isaac Lab 3.0 migration to xyzw, this looks like it should be [0.0, 0.0, 0.0, 1.0] (identity in xyzw). The commit message says wxyz→xyzw migration was done, but the default in the dataclass may have been missed. Verify this is intentional — the environment always overrides it, but the default is misleading/incorrect.

11. NistGearSmall and NistGearLarge are registered but never used
These asset classes exist in object_library.py but nothing in this PR references them. If they're for future use, add a brief comment. Otherwise, defer adding them until they're needed.

12. concate_obs (typo?) in RL Games YAML and wrapper
concate_obs_groups appears in the YAML config — is this intentionally concate (not concatenate)? This follows the existing RL Games wrapper naming, but worth noting.

13. RL Games YAML: player.deterministic: False
The default player config has deterministic: False. The RlGamesActionPolicy class defaults to deterministic: True. This inconsistency could cause confusion when someone evaluates using the raw RL Games player vs. the Arena policy runner.

14. Consider adding type hints to _KeypointDistanceComputer.compute
The class works well but return types and param types aren't annotated, which would help maintainability.

15. place_gear_in_gripper iterates gripper_joint_setter_func per-row
The loop for row_idx in range(n): self.gripper_joint_setter_func(...) is called twice (open then close). This is O(n) Python loops over environments. Since the setter just does joint_pos[row_indices, jid] = width / 2.0, this could be vectorized to a single tensor operation.

✅ What's Good

Clean separation of task/environment/policy following Arena conventions
Documentation mirrors the existing Franka lift workflow — consistent UX
Observation hardening (nan_to_num, force clamping) is a smart addition for contact-rich tasks
Commit history is well-structured: each commit has a clear purpose
Domain randomization is comprehensive (friction, mass, actuator gains, joint friction, yaw variation)
The GraspConfig dataclass is a good refactor — keeps the task constructor focused

Summary

The PR is in good shape architecturally. The main items to address are:

The gpu_collision_stack_size change scope (affects all assembly envs)
The quaternion default in GraspConfig (verify xyzw correctness)
Mutable default args (easy fix, use tuples)
Clean up the duplicate asset class (NistAssembledBoard vs NistBoardAssembled)
Duplicate copyright header in training script

The rest are quality-of-life improvements that can be addressed in follow-ups.

kellyguo11

PR #567 Code Review: NIST Gear Insertion Task

Summary

Well-structured PR adding a complete RL workflow for contact-rich gear insertion. The architecture follows Arena patterns cleanly. I see the prior review from @kellyguo11 already covered several important items — I'll focus on additional observations and a few overlapping concerns worth reinforcing.

🔴 Issues

1. _REPO_ROOT resolution in object_library.py is fragile
_REPO_ROOT = Path(__file__).resolve().parent.parent.parent computes a repo root from file location. If the package is installed as editable vs. site-packages, or the directory structure changes, this silently breaks. Consider using importlib.resources or a package-level constant.

2. wp.to_torch() wrapping throughout — potential memory/lifecycle issue
Every observation/reward/termination call does wp.to_torch(asset.data.root_pos_w). If Warp returns views into GPU memory, the torch tensors share that buffer. Any intermediate sim step could invalidate them. The code generally uses these in-place within a single step, which is fine, but the pattern of wp.to_torch(board.data.root_pos_w)[:n] followed by arithmetic should be verified not to alias buffers that get overwritten.

3. success_prediction_error reward uses mutable _pred_scale — non-deterministic across resets
self._pred_scale starts at 0.0 and flips to 1.0 once the mean success rate crosses delay_until_ratio. But it never resets back to 0.0. If training performance dips after initially hitting the threshold, the prediction penalty stays active. This is a design choice but should be documented — it creates a one-way ratchet effect on the reward landscape.

4. place_gear_in_gripper IK loop doesn't write root poses — sim state may be stale
The IK loop in events.py writes joint positions to sim (write_joint_position_to_sim_index) but never calls env.scene.write_data_to_sim() or steps the simulation between iterations. The body positions read via robot.data.body_pos_w may not reflect the written joint positions until the next sim step. If this IK converges in practice it's fine, but the convergence relies on the FK being updated between writes, which may only work because of how the articulation view caches data.

5. NistGearInsertionOscAction.process_actions — roll/pitch clipping logic has redundant branches
The code computes desired_roll and desired_pitch from target_quat (which already locks roll/pitch via flip_quat), then clips them against the current values. Since the target always has roll=π and pitch=0, the delta computation and clipping could be simplified. The current approach works but is harder to reason about correctness.

🟡 Suggestions

6. Observation function accesses arm_osc_action._smoothed_actions and .force_smooth — tight coupling
The policy observation function (NistGearInsertionPolicyObservations.__call__) both reads from and writes to the action term (arm_osc_action.force_smooth[:] = ...). An observation function mutating action-term state is a surprising side effect. Consider moving the force smoothing update into the action term's process_actions or a dedicated pre-observation hook.

7. contact_thresholds initialized to 7.5 but range is (5.0, 10.0) — reset-only randomization
In NistGearInsertionOscAction.__init__, contact_thresholds is set to 7.5 (midpoint). The randomization only happens on reset(). For the first episode before any reset is called, all envs use 7.5. This may be intentional but could subtly affect the first few rollouts.

8. _CACHED_OFFSETS global dict in gear_insertion_rewards.py — unbounded growth
The cache key is (tuple(values), device) and the tensors are only grown, never shrunk. In practice, with fixed offsets this is fine, but it's a pattern that could leak memory if offset values vary (e.g., if someone adds per-episode offset randomization). Consider a bounded cache or document the assumption.

9. RL Games YAML uses seq_length: 128 equal to horizon_length: 128
With LSTM and seq_length == horizon_length, each minibatch is a single unbroken sequence. This means no sequence boundary handling within a batch. This works but may limit effective batch diversity. Consider whether seq_length: 64 (half horizon) would give better gradient estimates.

10. .gitattributes adds *.jpg filter=lfs and *.pth filter=lfs globally
This now tracks ALL .jpg and .pth files in the repo via LFS, not just the ones in this PR. If someone later adds a small JPG or checkpoint for testing, it'll be LFS-tracked. Consider scoping to specific directories (e.g., docs/images/*.jpg, models/**/*.pth).

11. events.py API migration (write_root_pose_to_sim → write_root_pose_to_sim_index) touches all environments
The changes from write_root_pose_to_sim to write_root_pose_to_sim_index (and similar for velocity/joints) modify shared event functions used by other environments. This is likely needed for the Isaac Lab 3.0 migration, but it's a significant change bundled into a task-specific PR. If any of the old callers pass different argument patterns, this could break them.

12. Documentation is high quality but could note the isaaclab_tasks.direct.automate dependency
The place_gear_in_gripper event imports factory_control from isaaclab_tasks.direct.automate. This creates a dependency on the Factory task package. Worth noting in the environment setup docs or in a comment.

✅ What's Well Done

The keypoint squashing reward design with three scales (baseline/coarse/fine) is a solid curriculum-like approach
EMA smoothing on the action space with per-episode randomized factor is a nice touch for sim-to-real transfer
The GraspConfig dataclass cleanly separates embodiment concerns from task logic
Dead-zone thresholds on position/rotation prevent chattering near the target
Documentation follows the existing Arena workflow pattern consistently
The check_gear_insertion_geometry shared utility for the XY+Z check (addressing the duplication concern from the prior review) is the right approach

Verdict

The PR is architecturally sound and follows Arena conventions well. The main concerns are: (1) the observation function mutating action-term state (item 6), (2) the global .gitattributes scope change (item 10), and (3) the shared event function API migration bundled into this PR (item 11). The rest are quality improvements that can be addressed incrementally.

- Fix mutable default args (list -> tuple) in rewards/terminations - Remove duplicate copyright header in train_rl_games.py - Remove unused asset classes (NistAssembledBoard, NistGearSmall, NistGearLarge) - Add gpu_collision_stack_size comment explaining 4 GB requirement - Add docstrings for private API access (_terms, get_link_incoming_joint_force) - Extract shared check_gear_insertion_geometry utility - Fix GraspConfig.grasp_rot_offset default to xyzw identity [0,0,0,1] - Replace manual quat sign-flip with quat_unique - Fix observation reset() consistency (zero z/w, normalize, flip) - Vectorize place_gear_in_gripper gripper loop - Set player.deterministic: True in RL Games YAML - Add cached offset helper for reward tensor allocations - Migrate environment from RLFramework enum to string-based entry point - Add distributed training support to train_rl_games.py Signed-off-by: odoherty <odoherty@nvidia.com>

alexmillane

Thanks for putting this together. The docs look great. I have some comments on some of the code style

alexmillane · 2026-04-14T07:49:47Z

+class NistGearInsertionOscActionCfg(OperationalSpaceControllerActionCfg):
+    """Config for :class:`NistGearInsertionOscAction`."""
+
+    class_type: type[ActionTerm] = NistGearInsertionOscAction
+
+    fixed_asset_name: str = "gears_and_base"
+    peg_offset: tuple[float, float, float] = (0.0, 0.0, 0.0)
+    fixed_pos_noise_levels: tuple[float, float, float] = (0.001, 0.001, 0.001)
+    pos_action_bounds: tuple[float, float, float] = (0.05, 0.05, 0.05)
+    pos_action_threshold: tuple[float, float, float] = (0.02, 0.02, 0.02)
+    rot_action_threshold: tuple[float, float, float] = (0.097, 0.097, 0.097)
+    pos_threshold_noise_level: tuple[float, float, float] = (0.25, 0.25, 0.25)
+    rot_threshold_noise_level: tuple[float, float, float] = (0.29, 0.29, 0.29)
+    ema_factor_range: tuple[float, float] = (0.05, 0.2)
+    contact_threshold_range: tuple[float, float] = (5.0, 10.0)
+    # Dead zone: zero out small commanded deltas on the task wrench.
+    pos_dead_zone: tuple[float, float, float] = (0.0005, 0.0005, 0.0005)  # m, ~0.5 mm
+    rot_dead_zone: float = 0.001  # rad


Suggestion to add comments to these parameters. It's difficult to tell what they do without comments.

Also, consider including a default CONSTANT value so it's easier to avoid mistakes

alexmillane · 2026-04-14T07:50:30Z

+        yaw_bolt = math_utils.euler_xyz_from_quat(quat_rel_bolt, wrap_to_2pi=True)[2]
+        yaw_bolt = torch.where(yaw_bolt > math.pi / 2, yaw_bolt - 2 * math.pi, yaw_bolt)
+        yaw_bolt = torch.where(yaw_bolt < -math.pi, yaw_bolt + 2 * math.pi, yaw_bolt)
+        yaw_action = (yaw_bolt + math.radians(180.0)) / math.radians(270.0) * 2.0 - 1.0


This operation, divide by 270 degrees, seems odd. Suggestion to move this to a function with a description of the operation.

alexmillane · 2026-04-14T07:52:48Z

+        unrot_pi = math_utils.quat_from_euler_xyz(
+            torch.full((n,), -math.pi, device=self.device),
+            torch.zeros(n, device=self.device),
+            torch.zeros(n, device=self.device),
+        )


Is this a rotation of -pi around x?

suggestion to rename negative_pi_around_x.

The current name doesn't give much of a hint of what it's doing.

alexmillane · 2026-04-14T07:55:30Z

+            torch.zeros(n, device=self.device),
+            torch.zeros(n, device=self.device),
+        )
+        quat_rel_bolt = math_utils.quat_mul(unrot_pi, ee_quat)


Suggestion to rename to what this means.

q_xyzw_bolt_from_ee could mean the rotation that rotates vectors in the ee-frame to the bolt frame.

alexmillane · 2026-04-14T07:59:43Z

+        yaw_bolt = torch.where(yaw_bolt > math.pi / 2, yaw_bolt - 2 * math.pi, yaw_bolt)
+        yaw_bolt = torch.where(yaw_bolt < -math.pi, yaw_bolt + 2 * math.pi, yaw_bolt)


Not sure I get this. At first glance, it appears like some normalization, however on one side we compare with math.pi / 2 and on the other side with -math.pi.

It's a bit difficult to make sense of this. Suggestion to move it to a function where you describe the inputs and outputs.

alexmillane · 2026-04-14T08:09:24Z

We should move this to the new interop approach where we call lab scripts directly.

You can refer to https://isaac-sim.github.io/IsaacLab-Arena/main/pages/example_workflows/reinforcement_learning/step_2_policy_training.html#training-command
for how to use lab's train script on Arena-regsitered env

I checked the current setup: Arena’s documented --external_callback flow exists for rsl_rl/train.py, but Isaac Lab’s rl_games/train.py does not currently expose the same callback hook. Can we add the same --external_callback handling to Isaac Lab’s RL-Games train/play scripts that RSL-RL already has.

Reference - submodules/IsaacLab/scripts/reinforcement_learning/rsl_rl/train.py:69

I see. Can you break this MR into 2 MRs? One for env&task itself. The other for doc including how to train with your env? We can at least get the 1st MR in soon.
We will work with Lab on rl_games to see how feasible it is, and then upgrading Lab submodule will happen at a later time (early June).

alexmillane · 2026-04-14T09:21:14Z

alexmillane · 2026-04-14T09:27:19Z

+    pos = wp.to_torch(board.data.root_pos_w) - env.scene.env_origins
+    quat = wp.to_torch(board.data.root_quat_w)
+    offset = torch.tensor(peg_offset, device=env.device, dtype=torch.float32).unsqueeze(0).expand(env.num_envs, 3)
+    return pos + math_utils.quat_apply(quat, offset)


I'm not sure I understand this function.

Is this calculating the peg position w.r.t. the board?

Shouldn't such a function take in two SceneEntityCfg, look up their positions and calculate the difference.

The function name suggests that this is the peg position in the env frame... Why then do we need the board position?

alexmillane · 2026-04-14T09:28:04Z

+def held_gear_base_pos_in_env_frame(
+    env: ManagerBasedRLEnv,
+    gear_cfg: SceneEntityCfg = SceneEntityCfg("medium_nist_gear"),
+    held_gear_base_offset: tuple[float, ...] = (2.025e-2, 0.0, 0.0),
+) -> torch.Tensor:
+    """Position of the held gear's insertion point (root + offset in gear frame) in env frame."""
+    gear: RigidObject = env.scene[gear_cfg.name]
+    gear_pos = wp.to_torch(gear.data.root_pos_w) - env.scene.env_origins
+    gear_quat = wp.to_torch(gear.data.root_quat_w)
+    held_off = (
+        torch.tensor(held_gear_base_offset, device=env.device, dtype=torch.float32).unsqueeze(0).expand(env.num_envs, 3)
+    )
+    return gear_pos + math_utils.quat_apply(gear_quat, held_off)


This appears to be a duplicate of the function above with the variables renamed.

Suggestion to unify these functions into some common function with general variable names.

alexmillane · 2026-04-14T09:29:15Z

+        reset_quat = reset_quat * flip.unsqueeze(-1)
+        self._prev_noisy_quat[env_ids] = reset_quat
+
+    def __call__(


Suggestion to break long functions up like this into smaller functions with informative names.

+1 right now it resolves params, reads robot state, injects noise, computes
velocities, reads force/action state, masks quaternion, then concatenates everything.
How about

def __call__(...): params = self._resolve_call_params(...) ft_state = self._read_fingertip_state(env, params) noisy_state = self._apply_sensor_noise(env, ft_state, params) motion_obs = self._compute_motion_observations(env, noisy_state) force_obs = self._compute_force_observations(env, params) action_obs = self._compute_action_observations(env) target_obs = self._compute_target_observations(env, noisy_state, params) return self._pack_policy_observation( target_obs=target_obs, fingertip_quat=self._mask_symmetric_quat(noisy_state.quat), motion_obs=motion_obs, force_obs=force_obs, action_obs=action_obs, )

xyao-nv · 2026-05-01T17:52:17Z


 import isaaclab.sim as sim_utils

+_REPO_ROOT = Path(__file__).resolve().parent.parent.parent


Why do we need it?

Assets are stored locally, the Nist assets have not been moved to the nucleus cloud storage.

xyao-nv · 2026-05-01T18:42:28Z

+
+    name = "gears_and_base"
+    tags = ["object"]
+    usd_path = str(_REPO_ROOT / "assets" / "gearbase_and_gears_gearbase_root.usd")


i have uploaded your NIST folder to S3. It will be under f"{ISAACLAB_NUCLEUS_DIR}/Arena/assets/object_library/NIST/" in 6 hrs

xyao-nv · 2026-05-01T18:45:22Z

+
+_FACTORY_ASSET_DIR = f"{ISAACLAB_NUCLEUS_DIR}/Factory"
+
+FRANKA_MIMIC_OSC_CFG = ArticulationCfg(


Can you add it to franka.py, as another embodiment?

xyao-nv · 2026-05-01T18:47:11Z

@seun Doherty Can you also include a few test funs on your env, e.g. obs/reward/event term for task? Feel free to refer our tests folder see if anything can be reused.

xyao-nv · 2026-05-01T18:49:46Z

+        # EE pose, force feedback, prev actions) instead of the default embodiment obs.
+        # The task_obs group adds privileged state for the critic separately.
+        embodiment.observation_config.policy.actions = None
+        embodiment.observation_config.policy.gripper_pos = None
+        embodiment.observation_config.policy.eef_pos = None
+        embodiment.observation_config.policy.eef_quat = None
+        embodiment.observation_config.policy.joint_pos = None
+        embodiment.observation_config.policy.joint_vel = None
+        embodiment.observation_config.policy.nist_gear_policy_obs = ObsTerm(
+            func=NistGearInsertionPolicyObservations,
+            params={
+                "robot_name": "robot",
+                "board_name": gears_and_base.name,
+                "peg_offset": list(peg_tip_offset),
+                "fingertip_body_name": "panda_fingertip_centered",
+                "force_body_name": "force_sensor",
+                "pos_noise_level": 0.0,
+                "rot_noise_level_deg": 0.0,
+                "force_noise_level": 0.0,
+            },
+        )
+        # OSC torque control uses task-specific penalties (action magnitude, jerk,
+        # contact force) instead of the default joint-level regularizers.
+        embodiment.reward_config.action_rate = None
+        embodiment.reward_config.joint_vel = None


All embodiment related overwrites shall be in franka.py with you added new embodiment.

xyao-nv · 2026-05-01T18:51:03Z

+        )
+        scene = Scene(assets=[table, assembled_board, medium_gear, gears_and_base, light])
+
+        grasp_cfg = GraspConfig(


This seems to embodiment dependent, can you move it to franka.py in your embodiment cass?

xyao-nv · 2026-05-01T18:51:48Z

 from isaaclab_assets.robots.franka import FRANKA_PANDA_HIGH_PD_CFG

+
+def franka_gripper_joint_setter(


It shall be in franka.py

Signed-off-by: odoherty <odoherty@nvidia.com>

xyao-nv · 2026-05-08T00:39:47Z

+            "end_effector_body_name": "panda_hand",
+            "finger_joint_names": "panda_finger_joint[1-2]",
+            "grasp_rot_offset": [1.0, 0.0, 0.0, 0.0],
+            "grasp_offset": [0.02, 0.0, -0.128],


Can you add a comment why the offset? Also better to have those numbers as CONSTANT

xyao-nv · 2026-05-08T00:50:44Z

+
+    @staticmethod
+    def add_cli_args(parser: argparse.ArgumentParser) -> None:
+        parser.add_argument("--embodiment", type=str, default="franka_nist_gear_osc", help="Robot embodiment")


Nit
In your env class name, it has mentioned osc_env.
So if you expect this env only works with embodiment equipped with osc ctrl, can you remove this args.
If you expect this env can work with others, can you rename the env class name?

xyao-nv · 2026-05-08T00:52:38Z

+            teleop_device=teleop_device,
+            env_cfg_callback=mdp.assembly_env_cfg_callback,
+            rl_framework_entry_point="rl_games_cfg_entry_point",
+            rl_policy_cfg="isaaclab_arena_examples.policy:nist_gear_insertion_osc_rl_games.yaml",


Can you avoid hardcoding the module path?
Use something similar to rsl_rl_policy like rl_policy_cfg=f"{base_rsl_rl_policy.__name__}:RLPolicyCfg"

xyao-nv · 2026-05-08T00:53:24Z

+            xy_threshold=xy_threshold,
+        )
+
+        task = NistGearInsertionTask(


Can you update the task name as NistGearInsertionRLTask?

Does it need to be NIST-dependent

xyao-nv · 2026-05-08T05:32:59Z

    )


+_FRANKA_MIMIC_OSC_USD_PATH = f"{ISAACLAB_NUCLEUS_DIR}/Factory/franka_mimic.usd"


Can you move all of _FRANKA_MIMIC_OSC* to franka/nist_gear_insertion/ and only have the emodiment class construction in franka.py

xyao-nv

Can you refactor the files in a cleaner structure, with modular func that are reusable, and readable?

isaaclab_arena/
    tasks/
      nist_gear_insertion/
        __init__.py
        task.py                 # NistGearInsertionTask, GearInsertionGeometryCfg
        geometry.py             # peg/held gear pose helpers, success geometry
        observations.py         # generic gear insertion obs terms
        rewards.py              # generic geometry/keypoint rewards
        terminations.py         # gear insertion success/drop terms
        events.py               # gear/grasp reset events, if task-generic
        grasp.py                # GraspCfg + reusable grasp reset helpers

    embodiments/
      franka/
        franka.py               # generic Franka embodiments
        nist_gear_insertion/
           __init__.py
          nist_gear_osc.py        # FrankaNistGearInsertionOscEmbodiment
          nist_gear_grasp.py      # Franka-specific GraspCfg factory/helper, optional

  isaaclab_arena_environments/
    mdp/
      nist_gear_insertion/
        __init__.py
        osc_action.py           # NistGearInsertionOscAction/Cfg
        osc_observations.py     # 24-D OSC policy observation
        osc_rewards.py          # OSC action/contact/success-pred rewards

    nist_assembled_gearmesh_osc_environment.py

xyao-nv · 2026-05-08T05:53:29Z

+    from isaaclab.managers import ObservationTermCfg
+
+
+def _offset_pos_in_env_frame(


These are not really observation-specific:

_offset_pos_in_env_frame()
check_gear_insertion_geometry()
held_gear_base_pos_in_env_frame()
peg_pos_in_env_frame()

I’d move them to:

isaaclab_arena/tasks/gear_insertion/geometry.py

xyao-nv · 2026-05-08T05:53:36Z

+    return root_pos + math_utils.quat_apply(root_quat, offset_tensor)
+
+
+def peg_pos_in_env_frame(


These are not really observation-specific:

_offset_pos_in_env_frame()
check_gear_insertion_geometry()
held_gear_base_pos_in_env_frame()
peg_pos_in_env_frame()

I’d move them to:

isaaclab_arena/tasks/gear_insertion/geometry.py

xyao-nv · 2026-05-08T05:53:42Z

+    return _offset_pos_in_env_frame(env, board, peg_offset)
+
+
+def held_gear_base_pos_in_env_frame(


These are not really observation-specific:

_offset_pos_in_env_frame()
check_gear_insertion_geometry()
held_gear_base_pos_in_env_frame()
peg_pos_in_env_frame()

I’d move them to:

isaaclab_arena/tasks/gear_insertion/geometry.py

xyao-nv · 2026-05-08T05:54:01Z

+    return quat_unique(quat)
+
+
+def check_gear_insertion_geometry(


These are not really observation-specific:

_offset_pos_in_env_frame()
check_gear_insertion_geometry()
held_gear_base_pos_in_env_frame()
peg_pos_in_env_frame()

I’d move them to:

isaaclab_arena/tasks/gear_insertion/geometry.py

xyao-nv · 2026-05-08T06:00:20Z

+    return peg_pos - held_base
+
+
+class NistGearInsertionPolicyObservations(ManagerTermBase):


NistGearInsertionPolicyObservations is not generic task observation logic. It depends on the
OSC action term through:

get_nist_gear_insertion_arm_action(env)

That couples task observations to a specific RL/OSC controller. I’d move it to something
like:

isaaclab_arena_environments/mdp/nist_gear_insertion/osc_observations.py

Keep gear_insertion_observations.py for generic observations like peg position, held gear
base position, and peg delta.

xyao-nv · 2026-05-08T06:01:56Z

+        self._robot_name: str = p.get("robot_name", "robot")
+        self._board_name: str = p.get("board_name", "gears_and_base")
+        self._peg_offset_values = tuple(p.get("peg_offset", [0.0, 0.0, 0.0]))
+        self._peg_offset = torch.tensor(self._peg_offset_values, device=env.device)
+        self._fingertip_body: str = p.get("fingertip_body_name", "panda_fingertip_centered")
+        self._force_body: str = p.get("force_body_name", "force_sensor")
+
+        self._pos_noise: float = p.get("pos_noise_level", 0.00025)
+        self._rot_noise_deg: float = p.get("rot_noise_level_deg", 0.1)
+        self._force_noise: float = p.get("force_noise_level", 1.0)


@dataclass class GearInsertionPolicyObsCfg: robot_name: str board_name: str peg_offset: list[float] fingertip_body_name: str force_body_name: str pos_noise_level: float = 0.0 rot_noise_level_deg: float = 0.0 force_noise_level: float = 0.0

xyao-nv · 2026-05-08T06:04:08Z

+        obs = torch.cat(
+            [
+                fingertip_pos_rel,
+                obs_quat,
+                ee_linvel,
+                ee_angvel,
+                noisy_force,
+                force_threshold,
+                prev_actions,
+            ],
+            dim=-1,


The concatenation is very dedicated, and readers have to count dimensions manually. Add named
constants or a short layout tuple:

POLICY_OBS_LAYOUT = ( ("fingertip_pos_rel", 3), ("fingertip_quat", 4), ("ee_linvel", 3), ("ee_angvel", 3), ("force", 3), ("force_threshold", 1), ("prev_actions", 7), )

xyao-nv · 2026-05-08T06:07:04Z

+        reset_quat = reset_quat * flip.unsqueeze(-1)
+        self._prev_noisy_quat[env_ids] = reset_quat
+
+    def __call__(


+1 right now it resolves params, reads robot state, injects noise, computes
velocities, reads force/action state, masks quaternion, then concatenates everything.
How about

def __call__(...): params = self._resolve_call_params(...) ft_state = self._read_fingertip_state(env, params) noisy_state = self._apply_sensor_noise(env, ft_state, params) motion_obs = self._compute_motion_observations(env, noisy_state) force_obs = self._compute_force_observations(env, params) action_obs = self._compute_action_observations(env) target_obs = self._compute_target_observations(env, noisy_state, params) return self._pack_policy_observation( target_obs=target_obs, fingertip_quat=self._mask_symmetric_quat(noisy_state.quat), motion_obs=motion_obs, force_obs=force_obs, action_obs=action_obs, )

xyao-nv · 2026-05-08T06:20:35Z

@@ -0,0 +1,297 @@
+# Copyright (c) 2025-2026, The Isaac Lab Arena Project Developers (https://github.com/isaac-sim/IsaacLab-Arena/blob/main/CONTRIBUTORS.md).


Sorry I cannot understand thru your code.
Pulling Codex for help

• For rewards, I’d suggest two levels of cleanup: split the reward config builder, and split
the reward term implementations.

GearInsertionRewardsCfg

Right now the constructor builds everything inline. I’d make the reward groups obvious:

class GearInsertionRewardsCfg:
def init(...):
geometry = GearInsertionRewardGeometry(...)
self._add_keypoint_rewards(geometry)
self._add_geometry_bonus_rewards(geometry)
self._add_action_penalty_rewards()
self._add_success_prediction_reward(geometry)

Suggested helpers:

def _make_common_keypoint_params(...)
def _make_geometry_bonus_params(...)
def _add_keypoint_rewards(...)
def _add_geometry_bonus_rewards(...)
def _add_osc_penalty_rewards(...)
def _add_success_prediction_reward(...)

That makes the reward intent scan as:

alignment shaping
insertion bonuses
control penalties
success prediction loss

instead of one long constructor.

gear_peg_keypoint_squashing

This term does several things in call():

validates num_keypoints

resolves held gear offset

computes held gear insertion pose

resolves peg offset

computes target peg pose

applies random target noise

computes keypoint distance

applies squashing function

I’d break it like:

def call(...):
self._validate_num_keypoints(num_keypoints)
held_base_pos, gear_quat = self._compute_held_base_pose(env, gear_cfg,
held_gear_base_offset)
target_pos, target_quat = self._compute_target_pose(env, board_cfg, peg_offset)
target_pos = self._apply_target_noise(target_pos, peg_offset_xy_noise)
return self._compute_squashed_keypoint_reward(
target_pos, target_quat, held_base_pos, gear_quat, keypoint_scale, squash_a, squash_b
)

That would make the reward’s meaning much clearer.

Shared Geometry

gear_insertion_geometry_bonus and success_prediction_error both call
_compute_gear_position_success(), and termination does similar work too. I’d move this into a
shared geometry module:

isaaclab_arena/tasks/gear_insertion/geometry.py

with helpers like:

compute_peg_pose(...)
compute_held_gear_base_pose(...)
compute_insertion_success(...)

Then reward code becomes much easier to read because it says what it wants, not how to
rebuild frames every time.

OSC-Specific Rewards

These rewards are policy/controller-specific:

osc_action_magnitude_penalty
osc_action_delta_penalty
wrist_contact_force_penalty
success_prediction_error

They depend on:

get_nist_gear_insertion_arm_action(env)

I’d suggest moving them out of generic task rewards into something like:

isaaclab_arena_environments/mdp/nist_gear_insertion/osc_rewards.py

Generic task rewards should be things like keypoint alignment and insertion success. OSC
action smoothness, force threshold, and seventh-action success prediction belong to the OSC
RL environment.

success_prediction_error

This one especially deserves helpers because it has stateful behavior:

def call(...):
true_success = self._compute_true_success(...)
self._update_prediction_loss_gate(true_success, delay_until_ratio)
pred = self._read_success_prediction(env)
return self._prediction_error(true_success, pred)

Also consider renaming _pred_scale to something clearer like:

_success_prediction_loss_enabled

or

_prediction_loss_weight

because _pred_scale hides the fact that it flips on after enough successes.

xyao-nv · 2026-05-08T06:32:37Z

    return terminated
+
+
+def gear_mesh_insertion_success(


Sound to me both of funcs are task-specific enough that they probably should not live in the global tasks/terminations.py. Consider moving them under a NIST/gear-insertion module, e.g. tasks/gear_insertion/terminations.py, especially since the default asset names are medium_nist_gear and gears_and_base.

xyao-nv · 2026-05-08T06:33:27Z

 from isaaclab.sensors.contact_sensor.contact_sensor import ContactSensor
 from isaaclab.utils.math import combine_frame_transforms

+from isaaclab_arena.tasks.observations.gear_insertion_observations import check_gear_insertion_geometry


A termination importing geometry from an observation module is backwards. check_gear_insertion_geometry() is not observation-specific. I’d move it to shared geometry code:

isaaclab_arena/tasks/gear_insertion/geometry.py

with helpers like:

compute_peg_pos(...) compute_held_gear_base_pos(...) check_insertion_success(...)

xyao-nv · 2026-05-08T06:36:14Z

+        self._smoothed_actions[env_ids, 3:5] = 0.0
+        self._smoothed_actions[env_ids, 5] = yaw_action
+        self._smoothed_actions[env_ids, 6] = -1.0


ACTION_DIM = 7 POS_SLICE = slice(0, 3) ROLL_IDX = 3 PITCH_IDX = 4 YAW_IDX = 5 SUCCESS_IDX = 6

xyao-nv · 2026-05-08T06:37:10Z

+        target_yaw = _wrap_yaw_to_action_range(target_yaw)
+        return _target_yaw_to_action(target_yaw)
+
+    def reset(self, env_ids: Sequence[int] | None = None) -> None:


def reset(self, env_ids=None): env_ids, num_envs = self._normalize_env_ids(env_ids) if num_envs == 0: return super().reset(env_ids) self._reset_action_filter(env_ids, num_envs) self._reset_command_thresholds(env_ids, num_envs) self._reset_contact_state(env_ids, num_envs) self._reset_initial_smoothed_actions(env_ids) self._reset_debug_state(env_ids)

xyao-nv · 2026-05-08T06:42:01Z

+        """Return the next end-effector position command in the robot base frame."""
+        peg_pos_b = self._get_peg_pos() + self.fixed_pos_noise
+
+        pos_actions = self._smoothed_actions[:, :3]


docstring saying why :3?

xyao-nv · 2026-05-08T06:42:06Z

+            self._force_smoothing_factor * raw_force + (1.0 - self._force_smoothing_factor) * self.force_smooth
+        )
+
+    def _compute_target_position(self, ee_pos_b: torch.Tensor) -> torch.Tensor:


_compute_bounded_target_position_from_policy_action(...)

xyao-nv · 2026-05-08T06:42:21Z

+        clipped_delta = self._clip_delta_with_dead_zone(self.delta_pos, self._pos_thresh, self._pos_dead_zone)
+        return ee_pos_b + clipped_delta
+
+    def _compute_target_orientation(self, ee_quat_b: torch.Tensor) -> torch.Tensor:


_compute_bounded_target_orientation_from_policy_yaw(...)

xyao-nv · 2026-05-08T06:42:57Z

+
+    def _compute_target_orientation(self, ee_quat_b: torch.Tensor) -> torch.Tensor:
+        """Return the next end-effector orientation command in the robot base frame."""
+        target_yaw = _action_to_target_yaw(self._smoothed_actions[:, 5])


docstring saying why 5 or constant defining 5:= yaw

xyao-nv · 2026-05-08T06:44:34Z

+        self._force_body_idx: int | None = None
+        self._force_smoothing_factor = cfg.force_smoothing_factor
+
+    def _get_peg_pos(self, env_ids: torch.Tensor | None = None) -> torch.Tensor:


seems reusable -- compute_asset_local_offset_pos(env, fixed_asset_cfg, peg_offset) as a shared helper for obs/rewards/termination?

xyao-nv · 2026-05-08T06:49:06Z

+
+def _target_yaw_to_action(yaw: torch.Tensor) -> torch.Tensor:
+    """Map commanded yaw back to the normalized policy interval."""
+    return (yaw + math.pi) / math.radians(270.0) * 2.0 - 1.0


xyao-nv · 2026-05-08T06:50:31Z

+
+        self._smoothed_actions = torch.zeros(self.num_envs, 7, device=self.device)
+        self.ema_factor = torch.full((self.num_envs, 1), 0.05, device=self.device)
+        self._pos_bounds = torch.tensor(cfg.pos_action_bounds, device=self.device)


A few names are a little too generic:

_pos_bounds _pos_thresh _rot_thresh

Clearer names:

_position_action_bounds _position_step_limits _rotation_step_limits

xyao-nv

first off — really appreciate the volume of work you've put into this, it's clear a lot of effort has gone in. 🙏

I spent substantial amounts of time going through your updated one and left detailed comments on basically every file with suggestions around file structure and function-level refactors. The main themes are modularity, reusability, and readability — right now it's a bit hard to follow and reuse pieces in isolation. For the math-heavy parts in particular, I ran it by Codex and included its expert suggestions inline so you have a second technical perspective to pull from.

And we achieve the themes thru, but not limited to

Object orientated class
Modular, reusable helper funcs
Docstring that explains domain-specific math ops

Caveat: I may be missing some of the domain nuance here, so please push back on anything where the suggestion doesn't fit — happy to talk through it.

Could you also refactor out the RL-training-related scripts until we enable rl_game interop first? That matches what we aligned on earlier this week.

Nit -- if you follow any published code/work to implement this RL task, can you include it in your MR? It will help us understand how task is constructed while reading your code.

seun-doherty · 2026-05-08T20:56:12Z

first off — really appreciate the volume of work you've put into this, it's clear a lot of effort has gone in. 🙏

I spent substantial amounts of time going through your updated one and left detailed comments on basically every file with suggestions around file structure and function-level refactors. The main themes are modularity, reusability, and readability — right now it's a bit hard to follow and reuse pieces in isolation. For the math-heavy parts in particular, I ran it by Codex and included its expert suggestions inline so you have a second technical perspective to pull from.

And we achieve the themes thru, but not limited to

Object orientated class

Modular, reusable helper funcs

Docstring that explains domain-specific math ops

Caveat: I may be missing some of the domain nuance here, so please push back on anything where the suggestion doesn't fit — happy to talk through it.

Could you also refactor out the RL-training-related scripts until we enable rl_game interop first? That matches what we aligned on earlier this week.

Nit -- if you follow any published code/work to implement this RL task, can you include it in your MR? It will help us understand how task is constructed while reading your code.

Thank you, @xinjie Yao, for the detailed review. I know this is a large and somewhat complex change, so I appreciate the time you spent going through it carefully.

I have made a number of updates so far: refactoring large functions into smaller helpers, moving task-specific code into more appropriate modules, improving modularity/reusability/readability, and adding comments/docstrings where the logic is not obvious. The task is fairly specific and was originally adapted from the Isaac Lab Forge gear insertion direct RL environment, so part of the complexity comes from converting that logic into a manager-based Arena environment. I will add concise references back to the Forge source patterns and explain the domain-specific math/control logic where needed.

I agree with your structural feedback, especially around moving task-specific functions out of generic observation/termination modules and into gear-insertion-specific files. That direction makes the code easier to follow and review.

I would like to sync next week with you and Alex after I finish this cleanup pass, since I have been iterating on the structure and want to make sure the final organization aligns with the team’s expectations.

Signed-off-by: odoherty <odoherty@nvidia.com>

isaaclab-review-bot

🤖 Isaac Lab Review Bot

Summary

This PR adds a comprehensive NIST gear insertion RL workflow with Franka Panda OSC control, following the Factory/Forge insertion task architecture. The implementation includes well-structured task definitions, keypoint-squashing rewards, domain randomization, and RL-Games integration for LSTM-based asymmetric actor-critic training.

Findings

🔵 Suggestion: The _FRANKA_MIMIC_OSC_JOINT_POS dict in franka_osc.py (line ~65) only includes 8 joints (7 arm + panda_finger_joint2). Consider adding panda_finger_joint1 for completeness, though the mimic joint structure may handle this automatically.

🔵 Suggestion: In osc_action.py line ~292, the _ensure_force_body_idx method uses get_link_incoming_joint_force() which is noted as a private API. Consider adding a brief comment about Isaac Lab version compatibility if this API changes.

🔵 Suggestion: The NistGearInsertionPolicyObservations class zeros the z/w quaternion components (line ~69 in osc_observations.py) for symmetry reduction. A brief docstring explaining this design choice for gear symmetry would help maintainability.

🔵 Suggestion: In events.py, the _run_grasp_ik method (lines 176-198) implements iterative IK. Consider adding a log warning if max iterations are reached without convergence, to help debugging reset-time issues.

🟢 Positive: Excellent test coverage in test_nist_gear_insertion_task.py covering geometry, grasp config, OSC configs, and environment wiring.

🟢 Positive: Clean separation between generic task rewards (rewards.py) and OSC-specific penalties (osc_rewards.py).

🟢 Positive: The grasp reset event properly follows Factory/Forge patterns with IK-based placement and state flushing.

Verdict

Ship it ✅

Well-architected addition following established Isaac Lab patterns. The code is clean, well-documented, and has comprehensive test coverage. The suggestions above are minor improvements that could be addressed in follow-up PRs.

📝 Update (70ed06e): Added instance_name parameter to MediumNistGear.__init__() in object_library.py to align with parent LibraryObject constructor signature. Good fix for consistency.

greptile-apps · 2026-05-13T21:06:25Z

+        """Resolve the wrist force-sensor body index."""
+        if self._force_body_idx is not None:
+            return
+        body_ids, body_names = self._asset.find_bodies(self.cfg.force_body_name)


find_bodies is called with a bare string instead of a list. Every other find_bodies call in this PR wraps the name in a list (e.g. robot.find_bodies([body_name]) in osc_observations.py, events.py, and terminations.py). Isaac Lab's find_bodies accepts a list[str] of regex patterns; passing a bare string makes Python iterate over each character as a separate pattern, which will match zero bodies and immediately raise the ValueError("Found 0 matches") error below — blocking the environment from starting at all.

Suggested change

body_ids, body_names = self._asset.find_bodies(self.cfg.force_body_name)

body_ids, body_names = self._asset.find_bodies([self.cfg.force_body_name])

seun-doherty · 2026-05-13T21:07:42Z

NIST Gear Insertion PR Update

This update refactors and cleans up the NIST gear insertion implementation for review.

Summary

Restructured NIST gear insertion task logic into a dedicated package:
- isaaclab_arena/tasks/nist_gear_insertion/
- Split task config, geometry helpers, observations, rewards, terminations, and reset events into focused
  modules.
Moved OSC-specific MDP logic into:
- isaaclab_arena_environments/mdp/nist_gear_insertion/
- Keeps controller/policy-specific code separate from generic task logic.
Moved Franka NIST gear insertion OSC wiring out of the generic Franka file into:
- isaaclab_arena/embodiments/franka/nist_gear_insertion/
Removed the in-repo RL-Games training wrapper/config.
- Training will be handled outside the repo.
- Kept rl_games_action_policy.py for checkpoint playback/evaluation.
Updated NIST asset paths to use the IsaacLab Arena Nucleus path style:
- f"{ISAACLAB_NUCLEUS_DIR}/Arena/assets/object_library/NIST/..."
Addressed reviewer feedback around:
- Moving shared geometry out of observation modules.
- Moving task-specific terminations into the NIST gear insertion package.
- Separating OSC-specific rewards/observations/actions from generic task code.
- Replacing hard-coded action indices with named constants.
- Simplifying large functions.
- Improving comments/docstrings where the logic came from the Isaac Lab Factory/Forge gear insertion
  setup.
Refined tests to focus on relevant behavior:
- NIST task config/registration structure.
- RL-Games action policy behavior.

Ready for review @xyao-nv and @alexmillane

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

Signed-off-by: odoherty <odoherty@nvidia.com>

isaaclab-review-bot

🤖 Isaac Lab Review Bot — PR #567 Multi-Perspective Review

📋 Summary

This PR adds a complete NIST gear insertion RL task example for Isaac Lab Arena: a Franka Panda robot inserts a medium gear onto a peg on the NIST assembly board using operational-space control (OSC) and RL Games PPO. The implementation spans 21 new/modified files covering task logic, OSC action term, embodiment configs, observations, rewards, terminations, asset registry, environment wiring, RL Games policy wrapper, and tests.

🔬 Lens 1: Isaac Lab Expert — Task Design & Architecture

Strengths:

✅ Well-structured separation of concerns: task (isaaclab_arena/tasks/), environment-specific MDP (isaaclab_arena_environments/mdp/), and embodiment (embodiments/franka/nist_gear_insertion/)
✅ Follows Factory/Forge patterns: DLS IK for reset-time grasp placement, keypoint squashing rewards, asset-relative OSC commands
✅ GearInsertionGeometryCfg is a clean single source of truth for insertion dimensions shared across obs/rewards/terminations
✅ The OSC action term correctly handles EMA smoothing, dead zones, per-step delta clipping, and the 270° yaw remapping
✅ Quaternion convention migration (wxyz → xyzw for Isaac Lab 3.0) handled correctly with rot=(0.0, 0.0, 0.0, 1.0) identity
✅ Domain randomization events follow the established Arena pattern (mass, pose, actuator gains, joint friction)

Observations / Suggestions:

Mutable default in GraspCfg — While grasp_rot_offset and grasp_offset use field(default_factory=...), the type annotations say list[float]. Consider tuple[float, ...] for immutability signals, or document that callers should not mutate these.
peg_offset_for_obs vs peg_offset_from_board — The geometry config has two peg offsets differing only in Z (0.0 vs 0.025). The naming could be clearer — perhaps peg_base_offset and peg_tip_offset would better communicate the physical meaning (base of peg vs tip where the gear first engages).
Hardcoded gpu_collision_stack_size — The commit messages mention 4 GB for contact-heavy scenes, but this isn't visible in the diff. Consider adding a comment in the environment file or a SimCfg override so future maintainers understand the requirement.
NistBoardAssembled has rigid_body_enabled=False — This is intentional (visual-only), but a brief comment in object_library.py explaining the visual-only role would help (currently only in the docstring, not near the config).

🕵️ Lens 2: Silent Failure Hunter

Potential Issues:

wp.to_torch() without .clone() in some paths — In terminations.py:gear_dropped_from_gripper, wp.to_torch(robot.data.body_pos_w)[:, eef_indices[0]] and wp.to_torch(gear.data.root_pos_w) are used without .clone(). If warp returns views into GPU buffers that get overwritten between steps, this could produce stale data. The events file correctly uses .clone() — consider consistency here.
_ensure_force_body_idx in OSC action — If the force_body_name body doesn't exist on the USD, the error message is clear. However, the _update_smoothed_force is called at the top of process_actions before OSC sets the command. If the articulation hasn't been stepped yet (first call), get_link_incoming_joint_force() may return zeros or uninitialized values. The nan_to_num clamping mitigates this, but documenting the expected first-step behavior would help.
success_prediction_error._prediction_loss_weight is instance state that never resets — Once any batch achieves delay_until_ratio success, the weight stays at 1.0 forever. If training starts with an initial transient success spike (e.g., due to initialization luck), the auxiliary loss activates prematurely. Consider resetting on curriculum boundaries or using a moving average.
_RlGamesInferenceEnvWrapper monkey-patches env.reset — The try/finally block correctly restores it, but if RlGamesVecEnvWrapper.__init__ stores a reference to the patched reset method (e.g., self._reset_fn = env.reset), the wrapper would keep the patched version. The test covers this case — good.
compute_asset_local_offset_pose with optional num_envs — When num_envs is explicitly passed and smaller than env.num_envs, the slicing [:num_envs] works. But if ever passed a value larger than available envs, it silently returns a shorter tensor without error. Consider an assertion.

🧪 Lens 3: Test Coverage

What's tested (good):

✅ test_nist_gear_insertion_task.py (265 lines) — Thorough config-level tests validating that geometry feeds into observations, rewards, terminations, and events correctly
✅ test_rl_games_action_policy.py (43 lines) — Validates the inference wrapper doesn't trigger extra resets
✅ Tests cover geometry success thresholds, OSC yaw mapping, observation layout, and CLI args
✅ Domain randomization event wiring is tested (mass, pose, gains, friction)
✅ Edge case: asserting that enable_randomization=True without a grasp_cfg raises

Gaps:

⚠️ No integration/sim test — All tests are config-level/unit tests. There's no test that actually creates the environment and runs a few steps. This is understandable (needs Isaac Sim runtime) but should be noted.
⚠️ OSC action term not unit-tested — The NistGearInsertionOscAction class (477 lines) has complex logic (EMA, dead zones, yaw wrapping, force smoothing) that's only tested indirectly through the yaw mapping test. Consider adding isolated tests for _clip_delta_with_dead_zone, _reset_action_filter, and _reset_initial_smoothed_actions.
⚠️ Reward/termination computation not tested with mock tensors — The geometry test (test_gear_insertion_geometry_success_thresholds) is good but doesn't test the keypoint distance computation or squashing function behavior.
⚠️ osc_observations.py reset logic — The _flip_quats randomization and velocity computation are tested via test_policy_observation_layout_masks_symmetric_quaternion_and_computes_velocity, but not the full __call__ path with a mock environment.

🏁 Verdict

Overall: Approve with minor suggestions 👍

This is a well-structured, substantial contribution that follows Isaac Lab Factory/Forge patterns. The code quality is high — good separation of concerns, proper use of configclass, and defensive NaN/inf clamping throughout. The test coverage is solid for config-level validation. The main areas for improvement are:

Minor naming clarity (peg_offset_for_obs → peg_tip_offset)
Consistent .clone() usage with wp.to_torch() in terminations
Consider unit tests for the OSC action term's filtering logic
Document the gpu_collision_stack_size requirement in the environment definition

None of these are blockers — the PR is in good shape for merge.

isaaclab-review-bot

Code Review: NIST Gear Insertion Task

This PR adds a comprehensive NIST gear insertion RL workflow with operational-space control. The implementation follows Isaac Lab Factory/Forge patterns well. Below are findings that should be addressed before merge.

🔴 Critical Issues

1. `success_z_fraction` Default Inconsistency

File: isaaclab_arena/tasks/nist_gear_insertion/task.py (line ~55) vs terminations.py (line ~28)

GearInsertionGeometryCfg defaults to success_z_fraction: float = 0.30, but gear_mesh_insertion_success() defaults to 0.20. This means environments using the geometry config default will have different success thresholds than direct termination function calls.

# task.py - GearInsertionGeometryCfg
success_z_fraction: float = 0.30  # 30% of peg height

# terminations.py - gear_mesh_insertion_success default
success_z_fraction: float = 0.20  # 20% of peg height

Impact: Training vs evaluation success metrics may disagree silently.
Fix: Align both defaults to the same value (recommend 0.30 based on commit message mentioning loosening threshold).

2. `_prediction_loss_weight` Never Resets on Episode Boundaries

File: isaaclab_arena_environments/mdp/nist_gear_insertion/osc_rewards.py (lines ~133-145)

def _update_prediction_loss_weight(self, true_success: torch.Tensor, delay_until_ratio: float) -> None:
    """Enable the auxiliary loss once enough environments have reached success."""
    if true_success.float().mean() >= delay_until_ratio:
        self._prediction_loss_weight = 1.0

Once _prediction_loss_weight is set to 1.0, it never resets to 0. The ManagerTermBase.reset() is not overridden. This means:

Early in training when few envs succeed, weight stays 0
Once 25% succeed once, weight becomes 1.0 forever
If policy regresses and success rate drops, weight remains 1.0

Fix: Either:

Override reset() to re-evaluate the condition, or
Make this a per-step check that can decrease (current behavior may be intentional curriculum—if so, add a comment)

🟡 Medium Issues

3. Potential Race Condition in IK Jacobian Indexing

File: isaaclab_arena/tasks/nist_gear_insertion/events.py (lines ~180-190)

# Body index is shifted by one when indexing ``root_view.get_jacobians()``.
self.jacobi_body_idx = self.eef_idx - 1
assert self.jacobi_body_idx >= 0, "End-effector body must not be the articulation root."

The -1 offset assumes the Jacobian excludes the root body. This is Isaac Lab internal behavior that could change. Consider adding a more robust check or referencing the Isaac Lab API documentation version.

4. Missing `reset()` Override in `gear_peg_keypoint_squashing`

File: isaaclab_arena/tasks/nist_gear_insertion/rewards.py (lines ~90-100)

The reward term samples _offset_noise in its custom reset() method, but the base ManagerTermBase.reset() is called through super(). Verify the reset ordering doesn't cause the noise to be applied before observation terms read the peg position.

5. Force Sensor Body Resolution Could Fail Silently in Some Configurations

File: isaaclab_arena_environments/mdp/nist_gear_insertion/osc_action.py (lines ~230-245)

def _ensure_force_body_idx(self) -> None:
    if self._force_body_idx is not None:
        return
    body_ids, body_names = self._asset.find_bodies(self.cfg.force_body_name)
    if len(body_ids) != 1:
        raise ValueError(...)

The error message is good, but consider logging a warning if the USD doesn't have contact sensors enabled, since activate_contact_sensors=True is set in the embodiment but the action term doesn't verify this.

🟢 Minor / Style

6. Redundant `.clone()` Calls on Warp Tensors

Files: Multiple locations in events.py, osc_action.py

joint_pos = wp.to_torch(self.robot.data.joint_pos)[env_ids].clone()

wp.to_torch() already returns a new tensor. The .clone() may be unnecessary unless there's a specific mutation concern. Consider removing for performance, or add a comment explaining why cloning is needed.

7. Test File Uses Hardcoded Geometry Instead of Importing Constants

File: isaaclab_arena/tests/test_nist_gear_insertion_task.py (lines ~12-14)

_PEG_BASE_OFFSET = (0.02025, 0.0, 0.0)
_PEG_TIP_OFFSET = (0.02025, 0.0, 0.025)

These duplicate values from GearInsertionGeometryCfg. If the config defaults change, tests won't catch the drift.

✅ Strengths

Clean separation between task-level geometry rewards and controller-specific OSC penalties
Comprehensive test coverage for geometry, observations, and CLI
Good docstrings explaining Factory/Forge alignment
Proper quaternion convention handling (xyzw) with explicit comments
Domain randomization follows established patterns

Recommendation: Address the critical issues (1-2) before merge. The medium issues are lower priority but worth tracking.

Signed-off-by: odoherty <odoherty@nvidia.com>

greptile-apps · 2026-06-02T00:30:50Z

+            func=mdp_isaac_lab.randomize_actuator_gains,
+            mode="reset",
+            params={
+                "asset_cfg": SceneEntityCfg("robot", joint_names=arm_joints),
+                "stiffness_distribution_params": (0.75, 1.5),
+                "damping_distribution_params": (0.3, 3.0),
+                "operation": "scale",
+                "distribution": "log_uniform",
+            },
+        )
+        cfg.robot_joint_friction = EventTermCfg(
+            func=mdp_isaac_lab.randomize_joint_parameters,
+            mode="reset",
+            params={
+                "asset_cfg": SceneEntityCfg("robot", joint_names=arm_joints),
+                "friction_distribution_params": (0.3, 0.7),
+                "operation": "add",
+                "distribution": "uniform",
+            },


joint_names receives a bare string — domain randomization silently no-ops

arm_joints is the string "panda_joint[1-7]". SceneEntityCfg expects joint_names: list[str]; receiving a raw string causes Python to iterate over its characters ('p', 'a', 'n', …) rather than treating the whole string as a regex. Isaac Lab will find zero matching joints, so both robot_actuator_gains and robot_joint_friction randomization silently apply to no joints. Every other SceneEntityCfg / find_joints call in this PR wraps the pattern in a list (e.g., franka_osc_cfg.py line 85, events.py lines 118–119). The fix is joint_names=[arm_joints].

seun-doherty added 5 commits April 9, 2026 16:26

Add NIST gear insertion task with RL-Games training pipeline

012b8e3

NIST peg-insert gear assembly environment: custom OSC action term, keypoint squashing rewards, IK gripper reset, and domain randomization. Includes RL-Games PPO config, train/play scripts, and policy runner wrapper.

seun-doherty requested a review from alexmillane April 9, 2026 23:37

greptile-apps Bot reviewed Apr 9, 2026

View reviewed changes

kellyguo11 reviewed Apr 9, 2026

View reviewed changes

isaaclab-review-bot Bot mentioned this pull request Apr 9, 2026

Seun/nist gear insertion task example #566

Closed

seun-doherty requested a review from kellyguo11 April 10, 2026 17:12

kellyguo11 reviewed Apr 10, 2026

View reviewed changes

seun-doherty force-pushed the seun/nist_gear_insertion_task_example branch from 7e35c6f to b5d877e Compare April 10, 2026 17:29

seun-doherty requested a review from kellyguo11 April 10, 2026 17:30

alexmillane reviewed Apr 14, 2026

View reviewed changes

xyao-nv reviewed May 1, 2026

View reviewed changes

Add NIST gear insertion task

508fdb8

Signed-off-by: odoherty <odoherty@nvidia.com>

seun-doherty requested review from cvolkcvolk, peterd-NV, qianl-nv, viiik-inside and zhx06 as code owners May 6, 2026 23:29

seun-doherty requested a review from xyao-nv May 6, 2026 23:36

xyao-nv reviewed May 7, 2026

View reviewed changes

Comment thread isaaclab_arena/assets/object_library.py Outdated