v0.3 Add EnvGenAgent for agentic env gen#718
Conversation
Resolver matches object/robot proposal from LLM (based on name/tag) to the exact entries in the arena regestry. It outputs an ArenaEnvGraphSpec with the nodes, tasks and initial state scene graph filled in based on LLM output + resolver match.
… always called when loading from yaml/dict
There was a problem hiding this comment.
Update (24bc9bb):
Resolved:
- ✅
pydantic>=2.0added to RUNTIME_DEPS insetup.py - ✅ Old
llm_env_genmodule replaced by cleaneragentic_environment_generationmodule (JSON parsing issue no longer applicable)
Still present in new module:
⚠️ assert self.api_keypattern persists inenvironment_generation_agent.py:220— consider explicitValueErrorfor robustness underpython -O⚠️ test_generate_spec_against_live_endpointstill missing@pytest.mark.skipifforNV_API_KEY
New changes look good:
- Task parameterization via
params: dict[str, str]is cleaner thansubject/target required_task_init_param_names()introspection is a nice pattern- Code organization with section comments improves readability
Update (e5feae7):
Good refactor — structured_output_utils.py deleted and utilities split into a leaner agent_utils.py (keeping only ping, build_strict_schema, extract_response_text). The heavy check_structured_output_support() pre-flight probe is removed from the constructor, which simplifies initialization and removes the extra LLM round-trip on startup.
Resolved from prior review:
- ✅ Flaky live test (
test_default_model_supports_structured_output) removed along withtest_structured_output_utils.py
Still open:
⚠️ assert self.api_keypattern still inenvironment_generation_agent.py(previously noted)
Looks good:
- New
test_agent_utils.pycovers the retained helpers with clean mocked tests - Test fixture simplified (
stub_openainow returns single response instead of side_effect list)---
Update (21aa0d2):
Major refactor — dataclass→Pydantic migration for env-graph spec, unified RelationSpec/TaskSpec shared types, new features (random_yaw_init, NotNextTo relation, CLI override specs, eval runner --chunk_size subprocess splitting).
Previous concerns:
⚠️ assert self.api_keypattern — still present (unchanged file aside fromtarget→parentrename in prompt strings)⚠️ Missing@pytest.mark.skipifon live test — not addressed in this batch
New changes look good:
- Pydantic
model_validatorapproach gives better error messages than manualassertparsing RelationSpec/TaskSpecshared between agent intent and env-graph specs eliminates duplicationNotNextToloss strategy (distance-to-nearest-exit) avoids flat plateaus — nice gradient designrandom_yaw_initwith conservative bbox enclosure is correctly integrated into solver, validation, and runtime placement pathsPooledObjectPlacerreproducibility fix (per-envrandom.Randominstances viaget_rngs) is solid- Eval runner
--chunk_sizesubprocess dispatch is a pragmatic fix for host memory leaks - CLI override specs in YAML is a clean extensibility pattern
- Comprehensive test coverage for all new features
Update (eb8f802):
Small cleanup pass (2 commits, 4 files). No new features or behavioral changes:
environment_intent_spec.py: Removed redundantObjectRelationLibraryRegistryimport (relation validation now handled byRelationSpecitself). Validator renamed_validate_catalogue_kinds_are_registered→_validate_agent_intent_tasksand switched fromraise ValueErrortoassertfor consistency with the rest of the codebase. Removed dead sentence about "implicit intermediate state graphs" from docstring.arena_env_graph_types.py:TaskSpec._validate_task_kindconverted frommodel_validator→field_validator("kind")(more idiomatic for single-field checks).RelationSpec._validate_kind_and_arityswitched fromraise ValueErrortoassert._normalize_relation_paramshelper moved belowRelationSpec(closer to its only callerArenaEnvGraphSpatialConstraintSpec).arena_env_graph_spec.py:RelationSpec/TaskSpecremoved from__all__re-exports (callers should import fromarena_env_graph_typesdirectly).arena_env_graph_conversion_utils.py: Comment typo fix ("subject (child)" → "subject").
Note on assert vs raise: The shift from raise ValueError to assert in Pydantic validators is a style choice, but it means these checks will be silently skipped under python -O (optimized mode). Since this is test/development tooling unlikely to run optimized, it is acceptable here — just worth documenting as a convention.
Update (9dbd77f):
Schema flattening — ArenaEnvGraphTaskSpec now inherits from TaskSpec (instead of wrapping task: TaskSpec), and ArenaEnvGraphSpatialConstraintSpec now inherits from RelationSpec (instead of wrapping relation: RelationSpec). This removes one level of nesting in both the Python API and the YAML serialization format.
Also removes the task_description field from EnvironmentIntentSpec (redundant with the existing reasoning field).
Changes are clean:
- Inheritance approach is idiomatic Pydantic — avoids nested
.task.kind/.relation.subjectaccess patterns - All call sites, validation utilities, conversion utilities, and tests updated consistently
- YAML test fixture reformatted to match flat structure (no semantic change)
_minimal_env_graph_data()test helper updated, error-path tests adjusted to new flat keys- No behavioral regressions expected — this is purely structural
No new concerns. Previous notes still apply (assert self.api_key pattern, skipif).
Greptile SummaryThis PR introduces
Confidence Score: 5/5Safe to merge; the schema refactoring is consistent across types, tests, YAML fixtures, and conversion utils, and the new agent code is well-guarded. The field renames (subject/reference/kind) are applied uniformly across all affected files including the YAML test fixture, graph spec utils, and conversion utils. The 'unknown child node' parametrize case that looked potentially stale actually targets task_constraints (which deliberately retained parent/child), so there is no test regression. The two findings are minor style and documentation gaps with no runtime impact on the currently-targeted NVIDIA endpoint. arena_env_graph_types.py — TaskSpec.params shares the acknowledged strict-mode dict[str,Any] schema limitation with SpatialRelationSpec.params but lacks the same TODO note. Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant EGA as EnvironmentGenerationAgent
participant Reg as AssetRegistry / TaskRegistry / RelationRegistry
participant API as OpenAI-compatible Endpoint
participant Spec as EnvironmentIntentSpec (Pydantic)
User->>EGA: generate_spec(prompt)
EGA->>Reg: build_asset_catalogue()
EGA->>Reg: build_relation_catalogue()
EGA->>Reg: build_task_catalogue() [agent_ready only]
Reg-->>EGA: catalogues
EGA->>API: "chat.completions.create(response_format=json_schema, strict=True)"
API-->>EGA: choices[0].message (content or reasoning_content)
EGA->>EGA: "extract_response_text() + json.loads(strict=False)"
EGA->>Spec: model_validate(data)
Spec->>Spec: _validate_kind_and_arity (SpatialRelationSpec)
Spec->>Spec: _validate_agent_intent_tasks (agent-ready + required params)
Spec-->>EGA: EnvironmentIntentSpec
EGA-->>User: (EnvironmentIntentSpec, raw_json)
Reviews (24): Last reviewed commit: "Attempt CI fix 1: make isaaclab_arena/te..." | Re-trigger Greptile |
qianl-nv
left a comment
There was a problem hiding this comment.
@alexmillane I've rebased on #757. Now the TaskSpec and RelationSpec are shared with the ArenaEnvGraphSpec classes. Item spec still needs separate class, but I added a doc comment exaplaining its necessity. Please help review again.
| from isaaclab_arena.assets.registries import TaskRegistry | ||
| from isaaclab_arena.environments.arena_env_graph_types import RelationSpec, TaskSpec | ||
|
|
||
| ItemRole = Literal["foreground", "distractor", "anchor"] |
There was a problem hiding this comment.
Can you explain the role diff between foreground vs distractor? It sounds like just Objects, ending up as Node with type=Object/Rigid in EnvGraph. Is there a need to treat them differently here? Both of them need to have RigidBodyAPI enabled, and go thru physics engine.
A foreground object could also be an ObjectSet.
I feel like this categories is very PnP task focused. I don't see the value of having 3 types (yet).
There was a problem hiding this comment.
anchor is it just the same as ObjectReference?
There was a problem hiding this comment.
Not exactly — they operate at different layers:
-
anchor(inItemRole) is a semantic hint in the intent spec (pre-resolution). It tells the agent "this item is a reference surface for spatial relations, not a task object." The docstring notes it's rare because the background usually fills this role. -
ObjectReference(inEnvGraph) is a structural node type in the resolved env graph — a specific sub-prim of a parent asset (e.g.maple_table_robolab_tableis anobject_referenceundermaple_table_robolab, with an explicitprim_pathandobject_type).
They're related: in practice an anchor-role item in the intent will likely resolve to an ObjectReference node that gets an is_anchor spatial constraint (as seen in the test fixture with maple_table_robolab_table). But the intent-level anchor role doesn't guarantee ObjectReference — it's guidance for the LLM agent, while ObjectReference is a resolved graph structure.
If the distinction isn't pulling its weight for the LLM (since background already covers most anchor cases), it might be worth reconsidering whether the anchor role adds clarity or confusion at the intent layer.
There was a problem hiding this comment.
Good question! They are related but distinct concepts:
-
anchor(ItemRole) is a semantic role describing what purpose the item serves in the scene — specifically, reference surfaces that other objects are positioned relative to. The field description says: "anchor for reference surfaces (rare — the background usually covers this)." -
ObjectReference(ArenaEnvGraphObjectReferenceNodeSpec) is a node type in the env graph representing a USD prim inside a parent background asset (withparent,prim_path,object_typefields). E.g., a door handle sub-prim within a cabinet asset.
An anchor role could resolve to an ObjectReference node in the final graph (if the anchor is a sub-prim of a background), but it could also be a regular object node. Conversely, an ObjectReference might not have the anchor role if it is being used as a foreground object (e.g., a drawer handle the robot needs to grasp).
That said, I agree with your broader point from the parent comment — the current roles feel PnP-centric and the anchor role seems rare given backgrounds already handle most reference surfaces. Worth revisiting whether this 3-way distinction pulls its weight as more task types get added.
There was a problem hiding this comment.
Thanks for the suggestion. Had a similar discussion with Alex as well. Currently we do not explicitly use the foreground vs distractor downstream, so I go ahead and remove it.
The original intent is to keep distractor count as a way to control scene complexity, but as Alex point out also this does not require the distractor as an explicit agent output. It's something we can reason from the scene + task already.
There was a problem hiding this comment.
Makes sense — if the distinction isn't used downstream and complexity can be inferred from the scene + task context, removing it keeps the spec cleaner. Thanks for the quick update!
There was a problem hiding this comment.
Great call simplifying this — cleaner abstractions FTW. 👍
|
|
||
|
|
||
| @dataclass | ||
| class RelationCatalogue: |
There was a problem hiding this comment.
Hmm can you explain why do we need a RelationCatalogue? Sorry I'm lost.
There was a problem hiding this comment.
Alex suggested here #718 (comment)
to have an AssetCatalogue to captured all the information passed to the agent to generate the asset prompt -- before converting it to a string. This gives readers more structured description of what's in the prompt.
Simply following this pattern for Relation/Task as well.
There was a problem hiding this comment.
Makes sense — maintaining a consistent catalogue pattern across assets/relations/tasks is a good approach for readability and structure. Thanks for the context!
| return cls | ||
|
|
||
|
|
||
| def agent_ready(cls): |
There was a problem hiding this comment.
(Maybe out of the scope for this MR)
Apart from using decrator telling what agent to select from, is there any way at code level,
we can enforce agent that only uses the task args available in the constructor, instead of hallucinating from nowhere?
There was a problem hiding this comment.
I'm actually doing this when building the task catalog (e..g. task name + required params name + docstring description of the task) which is passed to the agent as user prompt
I think that's probably the best we can do already pre-generation. We can of course do more post-validations after the agent returns, is that what you have in mind?
|
|
||
|
|
||
| # TODO(qianl): remove this enum and check against relation registry for task constraints | ||
| class ArenaEnvGraphTaskConstraintType(Enum): |
There was a problem hiding this comment.
Let's remove reach in this MR and bring it later. I am thinking if all REACH shall be derived in resolve_constraints.
There was a problem hiding this comment.
The MR simply moved ArenaEnvGraphTaskConstraintType around a bit so the classes are better grouped in this file. I don't think we should remove it in this MR, as the MR doesn't change anything regarding task constraints. May be better if you make the updates in your resolve constraint MR?
There was a problem hiding this comment.
Fair point — keeping it scoped to what this MR actually changes makes sense. I'll handle the removal in the resolve constraint MR. 👍
Item has no scale field; drop it from _MINIMAL_SPEC so the fixture matches the schema. Signed-off-by: Qian Lin <qianl@nvidia.com>
Raise RuntimeError when the endpoint returns HTTP 200 with no choices, matching ping() and avoiding IndexError on content-filter paths. Signed-off-by: Qian Lin <qianl@nvidia.com>
Bring back the CLI-override note and at_pose TODO dropped during the flat schema refactor. Signed-off-by: Qian Lin <qianl@nvidia.com>
Pin the runtime dependency to the v2 SDK line used by EnvironmentGenerationAgent's json_schema response_format. Signed-off-by: Qian Lin <qianl@nvidia.com>
Align ping and generate_spec error paths with repo conventions; update tests to expect AssertionError. Signed-off-by: Qian Lin <qianl@nvidia.com>
Require [] for static scenes and only include tasks the user prompt explicitly requests. Signed-off-by: Qian Lin <qianl@nvidia.com>
…tionSpec. Align graph state relation type naming with SpatialRelationSpec. Signed-off-by: Qian Lin <qianl@nvidia.com>
…ersion.py subprocess
Summary
Add EnvironmentGenerationAgent to convert user prompt to EnvironmentIntentSpec (JSON)
Detailed description
Reason: Adds first piece in the agentic environment generation pipeline: call agentic inference endpoint to infer task, relation, robot & object names from prompt
Changed:
Impact: Task/Relation specs are refactored/fields renamed for clarify. This changes the ArenaEnvGraphSpec yaml fields.
To run: python -m isaaclab_arena.agentic_environment_generation.try_environment_intent_schema