v0.3 Add EnvGenAgent for agentic env gen by qianl-nv · Pull Request #718 · isaac-sim/IsaacLab-Arena

qianl-nv · 2026-05-26T14:40:43Z

Summary

Add EnvironmentGenerationAgent to convert user prompt to EnvironmentIntentSpec (JSON)

Detailed description

Reason: Adds first piece in the agentic environment generation pipeline: call agentic inference endpoint to infer task, relation, robot & object names from prompt
Changed:
- Add a new EnvironmentGenerationAgent class that constructs the available asset/relation/task from registry into prompt and query the inference endpoint.
- Add a new EnvironmentIntentSpec for specifying the output format of the agent. Task/Relations specs are shared with ArenaEnvGraph. A separate Item spec for foreground/distractor objects are separately defined since the agent returns query keywards instead of exact asset names from the registry.
- Refactor and regroup ArenaEnvGraphSpec types file so the Task/Relation spec can be shared between agent Intent and the full ArenaEnvGraph.
- Add agent-ready decorator to three tasks we plan to support for v0.3 milestone. EnvironmentGenerationAgent only use agent-ready tasks in prompt.
Impact: Task/Relation specs are refactored/fields renamed for clarify. This changes the ArenaEnvGraphSpec yaml fields.

To run: python -m isaaclab_arena.agentic_environment_generation.try_environment_intent_schema

Resolver matches object/robot proposal from LLM (based on name/tag) to the exact entries in the arena regestry. It outputs an ArenaEnvGraphSpec with the nodes, tasks and initial state scene graph filled in based on LLM output + resolver match.

… always called when loading from yaml/dict

isaaclab-review-bot

Update (24bc9bb):

Resolved:

✅ pydantic>=2.0 added to RUNTIME_DEPS in setup.py
✅ Old llm_env_gen module replaced by cleaner agentic_environment_generation module (JSON parsing issue no longer applicable)

Still present in new module:

⚠️ assert self.api_key pattern persists in environment_generation_agent.py:220 — consider explicit ValueError for robustness under python -O
⚠️ test_generate_spec_against_live_endpoint still missing @pytest.mark.skipif for NV_API_KEY

New changes look good:

Task parameterization via params: dict[str, str] is cleaner than subject/target
required_task_init_param_names() introspection is a nice pattern
Code organization with section comments improves readability

Update (e5feae7):

Good refactor — structured_output_utils.py deleted and utilities split into a leaner agent_utils.py (keeping only ping, build_strict_schema, extract_response_text). The heavy check_structured_output_support() pre-flight probe is removed from the constructor, which simplifies initialization and removes the extra LLM round-trip on startup.

Resolved from prior review:

✅ Flaky live test (test_default_model_supports_structured_output) removed along with test_structured_output_utils.py

Still open:

⚠️ assert self.api_key pattern still in environment_generation_agent.py (previously noted)

Looks good:

New test_agent_utils.py covers the retained helpers with clean mocked tests
Test fixture simplified (stub_openai now returns single response instead of side_effect list)---

Update (21aa0d2):

Major refactor — dataclass→Pydantic migration for env-graph spec, unified RelationSpec/TaskSpec shared types, new features (random_yaw_init, NotNextTo relation, CLI override specs, eval runner --chunk_size subprocess splitting).

Previous concerns:

⚠️ assert self.api_key pattern — still present (unchanged file aside from target→parent rename in prompt strings)
⚠️ Missing @pytest.mark.skipif on live test — not addressed in this batch

New changes look good:

Pydantic model_validator approach gives better error messages than manual assert parsing
RelationSpec / TaskSpec shared between agent intent and env-graph specs eliminates duplication
NotNextTo loss strategy (distance-to-nearest-exit) avoids flat plateaus — nice gradient design
random_yaw_init with conservative bbox enclosure is correctly integrated into solver, validation, and runtime placement paths
PooledObjectPlacer reproducibility fix (per-env random.Random instances via get_rngs) is solid
Eval runner --chunk_size subprocess dispatch is a pragmatic fix for host memory leaks
CLI override specs in YAML is a clean extensibility pattern
Comprehensive test coverage for all new features

Update (eb8f802):

Small cleanup pass (2 commits, 4 files). No new features or behavioral changes:

environment_intent_spec.py: Removed redundant ObjectRelationLibraryRegistry import (relation validation now handled by RelationSpec itself). Validator renamed _validate_catalogue_kinds_are_registered → _validate_agent_intent_tasks and switched from raise ValueError to assert for consistency with the rest of the codebase. Removed dead sentence about "implicit intermediate state graphs" from docstring.
arena_env_graph_types.py: TaskSpec._validate_task_kind converted from model_validator → field_validator("kind") (more idiomatic for single-field checks). RelationSpec._validate_kind_and_arity switched from raise ValueError to assert. _normalize_relation_params helper moved below RelationSpec (closer to its only caller ArenaEnvGraphSpatialConstraintSpec).
arena_env_graph_spec.py: RelationSpec/TaskSpec removed from __all__ re-exports (callers should import from arena_env_graph_types directly).
arena_env_graph_conversion_utils.py: Comment typo fix ("subject (child)" → "subject").

Note on assert vs raise: The shift from `raise ValueError` to `assert` in Pydantic validators is a style choice, but it means these checks will be silently skipped under `python -O` (optimized mode). Since this is test/development tooling unlikely to run optimized, it is acceptable here — just worth documenting as a convention.

Update (9dbd77f):

Schema flattening — ArenaEnvGraphTaskSpec now inherits from TaskSpec (instead of wrapping task: TaskSpec), and ArenaEnvGraphSpatialConstraintSpec now inherits from RelationSpec (instead of wrapping relation: RelationSpec). This removes one level of nesting in both the Python API and the YAML serialization format.

Also removes the task_description field from EnvironmentIntentSpec (redundant with the existing reasoning field).

Changes are clean:

Inheritance approach is idiomatic Pydantic — avoids nested .task.kind / .relation.subject access patterns
All call sites, validation utilities, conversion utilities, and tests updated consistently
YAML test fixture reformatted to match flat structure (no semantic change)
_minimal_env_graph_data() test helper updated, error-path tests adjusted to new flat keys
No behavioral regressions expected — this is purely structural

No new concerns. Previous notes still apply (⚠️ assert self.api_key pattern, ⚠️ live test skipif).

greptile-apps · 2026-05-26T14:46:11Z

Greptile Summary

This PR introduces EnvironmentGenerationAgent, the first piece of the agentic env-gen pipeline: it queries an OpenAI-compatible structured-output endpoint with a prompt and returns a validated EnvironmentIntentSpec (background, embodiment, items, spatial relations, tasks). It also refactors ArenaEnvGraphSpec field names for clarity (type→kind, parent/child→subject/reference, task_args→params), sharing SpatialRelationSpec and TaskSpec base classes between the agent intent and the full env-graph spec.

New agentic_environment_generation package: EnvironmentGenerationAgent builds asset/relation/task catalogues from the live registries, munges the Pydantic schema into strict-mode format via apply_strict_constraints, and sends the request with response_format=json_schema. Defensive guards for empty choices and DeepSeek's reasoning_content channel are included.
Schema refactoring in arena_env_graph_types.py: ArenaEnvGraphSpatialConstraintSpec → ArenaEnvGraphSpatialRelationSpec with renamed fields; ArenaEnvGraphTaskSpec now extends the new shared TaskSpec; all conversion utils, tests, and YAML fixtures updated consistently.
@agent_ready decorator added to PickAndPlaceTask, OpenDoorTask, and CloseDoorTask; only decorated tasks appear in the agent prompt.

Confidence Score: 5/5

Safe to merge; the schema refactoring is consistent across types, tests, YAML fixtures, and conversion utils, and the new agent code is well-guarded.

The field renames (subject/reference/kind) are applied uniformly across all affected files including the YAML test fixture, graph spec utils, and conversion utils. The 'unknown child node' parametrize case that looked potentially stale actually targets task_constraints (which deliberately retained parent/child), so there is no test regression. The two findings are minor style and documentation gaps with no runtime impact on the currently-targeted NVIDIA endpoint.

arena_env_graph_types.py — TaskSpec.params shares the acknowledged strict-mode dict[str,Any] schema limitation with SpatialRelationSpec.params but lacks the same TODO note.

Important Files Changed

Filename	Overview
isaaclab_arena/agentic_environment_generation/environment_generation_agent.py	Introduces EnvironmentGenerationAgent: builds asset/relation/task catalogues from registries, sends strict-mode JSON schema request, guards empty choices, handles DeepSeek reasoning_content routing — logic is clean and well-tested.
isaaclab_arena/agentic_environment_generation/agent_utils.py	Adds ping, build_strict_schema, apply_strict_constraints, and extract_response_text helpers; all defensive patterns from previous reviews (empty-choices guard, reasoning_content fallback) are present.
isaaclab_arena/environments/arena_env_graph_types.py	Refactors spatial constraint and task specs — renames parent/child → subject/reference, type → kind, and introduces shared TaskSpec/SpatialRelationSpec base classes; TaskSpec.params inherits the same dict[str,Any] strict-mode issue acknowledged (via TODO) only on SpatialRelationSpec.params.
isaaclab_arena/environments/graph_spec_utils.py	Updates assert_references_exist and assert_spatial_constraint_shapes to use subject/reference/kind; task constraints correctly retain parent/child naming (no rename needed there).
isaaclab_arena/tests/test_environment_generation_agent.py	Good unit test coverage including empty-choices guard and unescaped-control-char tolerance; _catalog helper has a dead relation_text parameter that should be removed for clarity.
isaaclab_arena/tests/test_arena_env_graph_spec.py	Correctly updates all field-name references from type→kind, parent→subject, child→reference for spatial constraints; 'unknown child node' parametrize case correctly targets task_constraints (parent/child unchanged) so no regression.

Sequence Diagram

sequenceDiagram
    participant User
    participant EGA as EnvironmentGenerationAgent
    participant Reg as AssetRegistry / TaskRegistry / RelationRegistry
    participant API as OpenAI-compatible Endpoint
    participant Spec as EnvironmentIntentSpec (Pydantic)

    User->>EGA: generate_spec(prompt)
    EGA->>Reg: build_asset_catalogue()
    EGA->>Reg: build_relation_catalogue()
    EGA->>Reg: build_task_catalogue() [agent_ready only]
    Reg-->>EGA: catalogues
    EGA->>API: "chat.completions.create(response_format=json_schema, strict=True)"
    API-->>EGA: choices[0].message (content or reasoning_content)
    EGA->>EGA: "extract_response_text() + json.loads(strict=False)"
    EGA->>Spec: model_validate(data)
    Spec->>Spec: _validate_kind_and_arity (SpatialRelationSpec)
    Spec->>Spec: _validate_agent_intent_tasks (agent-ready + required params)
    Spec-->>EGA: EnvironmentIntentSpec
    EGA-->>User: (EnvironmentIntentSpec, raw_json)

_{Reviews (24): Last reviewed commit: "Attempt CI fix 1: make isaaclab_arena/te..." | Re-trigger Greptile}

qianl-nv

@alexmillane I've rebased on #757. Now the TaskSpec and RelationSpec are shared with the ArenaEnvGraphSpec classes. Item spec still needs separate class, but I added a doc comment exaplaining its necessity. Please help review again.

xyao-nv · 2026-06-05T00:09:00Z

+from isaaclab_arena.assets.registries import TaskRegistry
+from isaaclab_arena.environments.arena_env_graph_types import RelationSpec, TaskSpec
+
+ItemRole = Literal["foreground", "distractor", "anchor"]


Can you explain the role diff between foreground vs distractor? It sounds like just Objects, ending up as Node with type=Object/Rigid in EnvGraph. Is there a need to treat them differently here? Both of them need to have RigidBodyAPI enabled, and go thru physics engine.

A foreground object could also be an ObjectSet.

I feel like this categories is very PnP task focused. I don't see the value of having 3 types (yet).

anchor is it just the same as ObjectReference?

Not exactly — they operate at different layers:

anchor (in ItemRole) is a semantic hint in the intent spec (pre-resolution). It tells the agent "this item is a reference surface for spatial relations, not a task object." The docstring notes it's rare because the background usually fills this role.

ObjectReference (in EnvGraph) is a structural node type in the resolved env graph — a specific sub-prim of a parent asset (e.g. maple_table_robolab_table is an object_reference under maple_table_robolab, with an explicit prim_path and object_type).

They're related: in practice an anchor-role item in the intent will likely resolve to an ObjectReference node that gets an is_anchor spatial constraint (as seen in the test fixture with maple_table_robolab_table). But the intent-level anchor role doesn't guarantee ObjectReference — it's guidance for the LLM agent, while ObjectReference is a resolved graph structure.

If the distinction isn't pulling its weight for the LLM (since background already covers most anchor cases), it might be worth reconsidering whether the anchor role adds clarity or confusion at the intent layer.

Good question! They are related but distinct concepts:

anchor (ItemRole) is a semantic role describing what purpose the item serves in the scene — specifically, reference surfaces that other objects are positioned relative to. The field description says: "anchor for reference surfaces (rare — the background usually covers this)."

ObjectReference (ArenaEnvGraphObjectReferenceNodeSpec) is a node type in the env graph representing a USD prim inside a parent background asset (with parent, prim_path, object_type fields). E.g., a door handle sub-prim within a cabinet asset.

An anchor role could resolve to an ObjectReference node in the final graph (if the anchor is a sub-prim of a background), but it could also be a regular object node. Conversely, an ObjectReference might not have the anchor role if it is being used as a foreground object (e.g., a drawer handle the robot needs to grasp).

That said, I agree with your broader point from the parent comment — the current roles feel PnP-centric and the anchor role seems rare given backgrounds already handle most reference surfaces. Worth revisiting whether this 3-way distinction pulls its weight as more task types get added.

Thanks for the suggestion. Had a similar discussion with Alex as well. Currently we do not explicitly use the foreground vs distractor downstream, so I go ahead and remove it.
The original intent is to keep distractor count as a way to control scene complexity, but as Alex point out also this does not require the distractor as an explicit agent output. It's something we can reason from the scene + task already.

Makes sense — if the distinction isn't used downstream and complexity can be inferred from the scene + task context, removing it keeps the spec cleaner. Thanks for the quick update!

Great call simplifying this — cleaner abstractions FTW. 👍

xyao-nv · 2026-06-05T00:33:30Z

+
+
+@dataclass
+class RelationCatalogue:


Hmm can you explain why do we need a RelationCatalogue? Sorry I'm lost.

Alex suggested here #718 (comment)
to have an AssetCatalogue to captured all the information passed to the agent to generate the asset prompt -- before converting it to a string. This gives readers more structured description of what's in the prompt.
Simply following this pattern for Relation/Task as well.

Makes sense — maintaining a consistent catalogue pattern across assets/relations/tasks is a good approach for readability and structure. Thanks for the context!

xyao-nv · 2026-06-05T01:13:00Z

    return cls
+
+
+def agent_ready(cls):


(Maybe out of the scope for this MR)
Apart from using decrator telling what agent to select from, is there any way at code level,
we can enforce agent that only uses the task args available in the constructor, instead of hallucinating from nowhere?

I'm actually doing this when building the task catalog (e..g. task name + required params name + docstring description of the task) which is passed to the agent as user prompt

https://github.com/isaac-sim/IsaacLab-Arena/pull/718/changes#diff-457c2de15d98c128d9d63a9715cfa841f9b53c531e6db83759373ef69fe7b861R184

I think that's probably the best we can do already pre-generation. We can of course do more post-validations after the agent returns, is that what you have in mind?

xyao-nv · 2026-06-05T01:28:27Z

+
+
+# TODO(qianl): remove this enum and check against relation registry for task constraints
+class ArenaEnvGraphTaskConstraintType(Enum):


Let's remove reach in this MR and bring it later. I am thinking if all REACH shall be derived in resolve_constraints.

The MR simply moved ArenaEnvGraphTaskConstraintType around a bit so the classes are better grouped in this file. I don't think we should remove it in this MR, as the MR doesn't change anything regarding task constraints. May be better if you make the updates in your resolve constraint MR?

Fair point — keeping it scoped to what this MR actually changes makes sense. I'll handle the removal in the resolve constraint MR. 👍

Item has no scale field; drop it from _MINIMAL_SPEC so the fixture matches the schema. Signed-off-by: Qian Lin <qianl@nvidia.com>

Raise RuntimeError when the endpoint returns HTTP 200 with no choices, matching ping() and avoiding IndexError on content-filter paths. Signed-off-by: Qian Lin <qianl@nvidia.com>

Bring back the CLI-override note and at_pose TODO dropped during the flat schema refactor. Signed-off-by: Qian Lin <qianl@nvidia.com>

Pin the runtime dependency to the v2 SDK line used by EnvironmentGenerationAgent's json_schema response_format. Signed-off-by: Qian Lin <qianl@nvidia.com>

Align ping and generate_spec error paths with repo conventions; update tests to expect AssertionError. Signed-off-by: Qian Lin <qianl@nvidia.com>

Require [] for static scenes and only include tasks the user prompt explicitly requests. Signed-off-by: Qian Lin <qianl@nvidia.com>

…tionSpec. Align graph state relation type naming with SpatialRelationSpec. Signed-off-by: Qian Lin <qianl@nvidia.com>

…ersion.py subprocess

qianl-nv added 7 commits May 26, 2026 18:39

Add LLMAgent code

0edda59

Add asset resolver

d4bdaa6

Resolver matches object/robot proposal from LLM (based on name/tag) to the exact entries in the arena regestry. It outputs an ArenaEnvGraphSpec with the nodes, tasks and initial state scene graph filled in based on LLM output + resolver match.

Move assert_unique_ids + assert_references_exist to post_init so it's…

3162272

… always called when loading from yaml/dict

Update schema to better support unary relation

06d4b0e

Simplify relation types in resolver

087c498

Fix copyright years

cf38ec3

Add unit test

7e5f58e

qianl-nv requested review from alexmillane, cvolkcvolk, peterd-NV, viiik-inside, xyao-nv and zhx06 as code owners May 26, 2026 14:40

isaaclab-review-bot Bot reviewed May 26, 2026

View reviewed changes

greptile-apps Bot reviewed May 26, 2026

View reviewed changes

Comment thread setup.py Outdated

Comment thread isaaclab_arena/llm_env_gen/llm_agent.py Outdated