Skip to content

v0.3 Add EnvGenAgent for agentic env gen#718

Open
qianl-nv wants to merge 81 commits into
mainfrom
qianl/dev/agentic_llm
Open

v0.3 Add EnvGenAgent for agentic env gen#718
qianl-nv wants to merge 81 commits into
mainfrom
qianl/dev/agentic_llm

Conversation

@qianl-nv
Copy link
Copy Markdown
Collaborator

@qianl-nv qianl-nv commented May 26, 2026

Summary

Add EnvironmentGenerationAgent to convert user prompt to EnvironmentIntentSpec (JSON)

Detailed description

  • Reason: Adds first piece in the agentic environment generation pipeline: call agentic inference endpoint to infer task, relation, robot & object names from prompt

  • Changed:

    • Add a new EnvironmentGenerationAgent class that constructs the available asset/relation/task from registry into prompt and query the inference endpoint.
    • Add a new EnvironmentIntentSpec for specifying the output format of the agent. Task/Relations specs are shared with ArenaEnvGraph. A separate Item spec for foreground/distractor objects are separately defined since the agent returns query keywards instead of exact asset names from the registry.
    • Refactor and regroup ArenaEnvGraphSpec types file so the Task/Relation spec can be shared between agent Intent and the full ArenaEnvGraph.
    • Add agent-ready decorator to three tasks we plan to support for v0.3 milestone. EnvironmentGenerationAgent only use agent-ready tasks in prompt.
  • Impact: Task/Relation specs are refactored/fields renamed for clarify. This changes the ArenaEnvGraphSpec yaml fields.

To run: python -m isaaclab_arena.agentic_environment_generation.try_environment_intent_schema

qianl-nv added 7 commits May 26, 2026 18:39
Resolver matches object/robot proposal from LLM (based on name/tag) to
the exact entries in the arena regestry.
It outputs an ArenaEnvGraphSpec with the nodes, tasks and initial
state scene graph filled in based on LLM output + resolver match.
Copy link
Copy Markdown
Contributor

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Update (24bc9bb):

Resolved:

  • pydantic>=2.0 added to RUNTIME_DEPS in setup.py
  • ✅ Old llm_env_gen module replaced by cleaner agentic_environment_generation module (JSON parsing issue no longer applicable)

Still present in new module:

  • ⚠️ assert self.api_key pattern persists in environment_generation_agent.py:220 — consider explicit ValueError for robustness under python -O
  • ⚠️ test_generate_spec_against_live_endpoint still missing @pytest.mark.skipif for NV_API_KEY

New changes look good:

  • Task parameterization via params: dict[str, str] is cleaner than subject/target
  • required_task_init_param_names() introspection is a nice pattern
  • Code organization with section comments improves readability

Update (e5feae7):

Good refactor — structured_output_utils.py deleted and utilities split into a leaner agent_utils.py (keeping only ping, build_strict_schema, extract_response_text). The heavy check_structured_output_support() pre-flight probe is removed from the constructor, which simplifies initialization and removes the extra LLM round-trip on startup.

Resolved from prior review:

  • ✅ Flaky live test (test_default_model_supports_structured_output) removed along with test_structured_output_utils.py

Still open:

  • ⚠️ assert self.api_key pattern still in environment_generation_agent.py (previously noted)

Looks good:

  • New test_agent_utils.py covers the retained helpers with clean mocked tests
  • Test fixture simplified (stub_openai now returns single response instead of side_effect list)---

Update (21aa0d2):

Major refactor — dataclass→Pydantic migration for env-graph spec, unified RelationSpec/TaskSpec shared types, new features (random_yaw_init, NotNextTo relation, CLI override specs, eval runner --chunk_size subprocess splitting).

Previous concerns:

  • ⚠️ assert self.api_key pattern — still present (unchanged file aside from targetparent rename in prompt strings)
  • ⚠️ Missing @pytest.mark.skipif on live test — not addressed in this batch

New changes look good:

  • Pydantic model_validator approach gives better error messages than manual assert parsing
  • RelationSpec / TaskSpec shared between agent intent and env-graph specs eliminates duplication
  • NotNextTo loss strategy (distance-to-nearest-exit) avoids flat plateaus — nice gradient design
  • random_yaw_init with conservative bbox enclosure is correctly integrated into solver, validation, and runtime placement paths
  • PooledObjectPlacer reproducibility fix (per-env random.Random instances via get_rngs) is solid
  • Eval runner --chunk_size subprocess dispatch is a pragmatic fix for host memory leaks
  • CLI override specs in YAML is a clean extensibility pattern
  • Comprehensive test coverage for all new features

Update (eb8f802):

Small cleanup pass (2 commits, 4 files). No new features or behavioral changes:

  • environment_intent_spec.py: Removed redundant ObjectRelationLibraryRegistry import (relation validation now handled by RelationSpec itself). Validator renamed _validate_catalogue_kinds_are_registered_validate_agent_intent_tasks and switched from raise ValueError to assert for consistency with the rest of the codebase. Removed dead sentence about "implicit intermediate state graphs" from docstring.
  • arena_env_graph_types.py: TaskSpec._validate_task_kind converted from model_validatorfield_validator("kind") (more idiomatic for single-field checks). RelationSpec._validate_kind_and_arity switched from raise ValueError to assert. _normalize_relation_params helper moved below RelationSpec (closer to its only caller ArenaEnvGraphSpatialConstraintSpec).
  • arena_env_graph_spec.py: RelationSpec/TaskSpec removed from __all__ re-exports (callers should import from arena_env_graph_types directly).
  • arena_env_graph_conversion_utils.py: Comment typo fix ("subject (child)" → "subject").

Note on assert vs raise: The shift from raise ValueError to assert in Pydantic validators is a style choice, but it means these checks will be silently skipped under python -O (optimized mode). Since this is test/development tooling unlikely to run optimized, it is acceptable here — just worth documenting as a convention.

Update (9dbd77f):

Schema flattening — ArenaEnvGraphTaskSpec now inherits from TaskSpec (instead of wrapping task: TaskSpec), and ArenaEnvGraphSpatialConstraintSpec now inherits from RelationSpec (instead of wrapping relation: RelationSpec). This removes one level of nesting in both the Python API and the YAML serialization format.

Also removes the task_description field from EnvironmentIntentSpec (redundant with the existing reasoning field).

Changes are clean:

  • Inheritance approach is idiomatic Pydantic — avoids nested .task.kind / .relation.subject access patterns
  • All call sites, validation utilities, conversion utilities, and tests updated consistently
  • YAML test fixture reformatted to match flat structure (no semantic change)
  • _minimal_env_graph_data() test helper updated, error-path tests adjusted to new flat keys
  • No behavioral regressions expected — this is purely structural

No new concerns. Previous notes still apply (⚠️ assert self.api_key pattern, ⚠️ live test skipif).

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 26, 2026

Greptile Summary

This PR introduces EnvironmentGenerationAgent, the first piece of the agentic env-gen pipeline: it queries an OpenAI-compatible structured-output endpoint with a prompt and returns a validated EnvironmentIntentSpec (background, embodiment, items, spatial relations, tasks). It also refactors ArenaEnvGraphSpec field names for clarity (typekind, parent/childsubject/reference, task_argsparams), sharing SpatialRelationSpec and TaskSpec base classes between the agent intent and the full env-graph spec.

  • New agentic_environment_generation package: EnvironmentGenerationAgent builds asset/relation/task catalogues from the live registries, munges the Pydantic schema into strict-mode format via apply_strict_constraints, and sends the request with response_format=json_schema. Defensive guards for empty choices and DeepSeek's reasoning_content channel are included.
  • Schema refactoring in arena_env_graph_types.py: ArenaEnvGraphSpatialConstraintSpecArenaEnvGraphSpatialRelationSpec with renamed fields; ArenaEnvGraphTaskSpec now extends the new shared TaskSpec; all conversion utils, tests, and YAML fixtures updated consistently.
  • @agent_ready decorator added to PickAndPlaceTask, OpenDoorTask, and CloseDoorTask; only decorated tasks appear in the agent prompt.

Confidence Score: 5/5

Safe to merge; the schema refactoring is consistent across types, tests, YAML fixtures, and conversion utils, and the new agent code is well-guarded.

The field renames (subject/reference/kind) are applied uniformly across all affected files including the YAML test fixture, graph spec utils, and conversion utils. The 'unknown child node' parametrize case that looked potentially stale actually targets task_constraints (which deliberately retained parent/child), so there is no test regression. The two findings are minor style and documentation gaps with no runtime impact on the currently-targeted NVIDIA endpoint.

arena_env_graph_types.py — TaskSpec.params shares the acknowledged strict-mode dict[str,Any] schema limitation with SpatialRelationSpec.params but lacks the same TODO note.

Important Files Changed

Filename Overview
isaaclab_arena/agentic_environment_generation/environment_generation_agent.py Introduces EnvironmentGenerationAgent: builds asset/relation/task catalogues from registries, sends strict-mode JSON schema request, guards empty choices, handles DeepSeek reasoning_content routing — logic is clean and well-tested.
isaaclab_arena/agentic_environment_generation/agent_utils.py Adds ping, build_strict_schema, apply_strict_constraints, and extract_response_text helpers; all defensive patterns from previous reviews (empty-choices guard, reasoning_content fallback) are present.
isaaclab_arena/environments/arena_env_graph_types.py Refactors spatial constraint and task specs — renames parent/child → subject/reference, type → kind, and introduces shared TaskSpec/SpatialRelationSpec base classes; TaskSpec.params inherits the same dict[str,Any] strict-mode issue acknowledged (via TODO) only on SpatialRelationSpec.params.
isaaclab_arena/environments/graph_spec_utils.py Updates assert_references_exist and assert_spatial_constraint_shapes to use subject/reference/kind; task constraints correctly retain parent/child naming (no rename needed there).
isaaclab_arena/tests/test_environment_generation_agent.py Good unit test coverage including empty-choices guard and unescaped-control-char tolerance; _catalog helper has a dead relation_text parameter that should be removed for clarity.
isaaclab_arena/tests/test_arena_env_graph_spec.py Correctly updates all field-name references from type→kind, parent→subject, child→reference for spatial constraints; 'unknown child node' parametrize case correctly targets task_constraints (parent/child unchanged) so no regression.

Sequence Diagram

sequenceDiagram
    participant User
    participant EGA as EnvironmentGenerationAgent
    participant Reg as AssetRegistry / TaskRegistry / RelationRegistry
    participant API as OpenAI-compatible Endpoint
    participant Spec as EnvironmentIntentSpec (Pydantic)

    User->>EGA: generate_spec(prompt)
    EGA->>Reg: build_asset_catalogue()
    EGA->>Reg: build_relation_catalogue()
    EGA->>Reg: build_task_catalogue() [agent_ready only]
    Reg-->>EGA: catalogues
    EGA->>API: "chat.completions.create(response_format=json_schema, strict=True)"
    API-->>EGA: choices[0].message (content or reasoning_content)
    EGA->>EGA: "extract_response_text() + json.loads(strict=False)"
    EGA->>Spec: model_validate(data)
    Spec->>Spec: _validate_kind_and_arity (SpatialRelationSpec)
    Spec->>Spec: _validate_agent_intent_tasks (agent-ready + required params)
    Spec-->>EGA: EnvironmentIntentSpec
    EGA-->>User: (EnvironmentIntentSpec, raw_json)
Loading

Reviews (24): Last reviewed commit: "Attempt CI fix 1: make isaaclab_arena/te..." | Re-trigger Greptile

Comment thread setup.py Outdated
Comment thread isaaclab_arena/llm_env_gen/llm_agent.py Outdated
Comment thread isaaclab_arena/llm_env_gen/schema.py Outdated
Comment thread isaaclab_arena/llm_env_gen/resolver.py Outdated
Comment thread isaaclab_arena/llm_env_gen/llm_agent.py Outdated
Comment thread isaaclab_arena/llm_env_gen/llm_agent.py Outdated
Comment thread isaaclab_arena/llm_env_gen/llm_agent.py Outdated
Comment thread isaaclab_arena/environments/agentic_env_gen/env_gen_agent.py Outdated
Comment thread isaaclab_arena/llm_env_gen/llm_agent.py Outdated
Comment thread isaaclab_arena/llm_env_gen/llm_agent.py Outdated
Comment thread isaaclab_arena/llm_env_gen/llm_agent.py Outdated
Comment thread isaaclab_arena/llm_env_gen/llm_agent.py Outdated
Copy link
Copy Markdown
Collaborator Author

@qianl-nv qianl-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexmillane I've rebased on #757. Now the TaskSpec and RelationSpec are shared with the ArenaEnvGraphSpec classes. Item spec still needs separate class, but I added a doc comment exaplaining its necessity. Please help review again.

Comment thread isaaclab_arena/agentic_environment_generation/agent_utils.py Outdated
Comment thread isaaclab_arena/agentic_environment_generation/environment_intent_spec.py Outdated
from isaaclab_arena.assets.registries import TaskRegistry
from isaaclab_arena.environments.arena_env_graph_types import RelationSpec, TaskSpec

ItemRole = Literal["foreground", "distractor", "anchor"]
Copy link
Copy Markdown
Collaborator

@xyao-nv xyao-nv Jun 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the role diff between foreground vs distractor? It sounds like just Objects, ending up as Node with type=Object/Rigid in EnvGraph. Is there a need to treat them differently here? Both of them need to have RigidBodyAPI enabled, and go thru physics engine.

A foreground object could also be an ObjectSet.

I feel like this categories is very PnP task focused. I don't see the value of having 3 types (yet).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anchor is it just the same as ObjectReference?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly — they operate at different layers:

  • anchor (in ItemRole) is a semantic hint in the intent spec (pre-resolution). It tells the agent "this item is a reference surface for spatial relations, not a task object." The docstring notes it's rare because the background usually fills this role.

  • ObjectReference (in EnvGraph) is a structural node type in the resolved env graph — a specific sub-prim of a parent asset (e.g. maple_table_robolab_table is an object_reference under maple_table_robolab, with an explicit prim_path and object_type).

They're related: in practice an anchor-role item in the intent will likely resolve to an ObjectReference node that gets an is_anchor spatial constraint (as seen in the test fixture with maple_table_robolab_table). But the intent-level anchor role doesn't guarantee ObjectReference — it's guidance for the LLM agent, while ObjectReference is a resolved graph structure.

If the distinction isn't pulling its weight for the LLM (since background already covers most anchor cases), it might be worth reconsidering whether the anchor role adds clarity or confusion at the intent layer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! They are related but distinct concepts:

  • anchor (ItemRole) is a semantic role describing what purpose the item serves in the scene — specifically, reference surfaces that other objects are positioned relative to. The field description says: "anchor for reference surfaces (rare — the background usually covers this)."

  • ObjectReference (ArenaEnvGraphObjectReferenceNodeSpec) is a node type in the env graph representing a USD prim inside a parent background asset (with parent, prim_path, object_type fields). E.g., a door handle sub-prim within a cabinet asset.

An anchor role could resolve to an ObjectReference node in the final graph (if the anchor is a sub-prim of a background), but it could also be a regular object node. Conversely, an ObjectReference might not have the anchor role if it is being used as a foreground object (e.g., a drawer handle the robot needs to grasp).

That said, I agree with your broader point from the parent comment — the current roles feel PnP-centric and the anchor role seems rare given backgrounds already handle most reference surfaces. Worth revisiting whether this 3-way distinction pulls its weight as more task types get added.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Had a similar discussion with Alex as well. Currently we do not explicitly use the foreground vs distractor downstream, so I go ahead and remove it.
The original intent is to keep distractor count as a way to control scene complexity, but as Alex point out also this does not require the distractor as an explicit agent output. It's something we can reason from the scene + task already.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense — if the distinction isn't used downstream and complexity can be inferred from the scene + task context, removing it keeps the spec cleaner. Thanks for the quick update!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great call simplifying this — cleaner abstractions FTW. 👍

Comment thread isaaclab_arena/environments/arena_env_graph_types.py Outdated


@dataclass
class RelationCatalogue:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm can you explain why do we need a RelationCatalogue? Sorry I'm lost.

Copy link
Copy Markdown
Collaborator Author

@qianl-nv qianl-nv Jun 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alex suggested here #718 (comment)
to have an AssetCatalogue to captured all the information passed to the agent to generate the asset prompt -- before converting it to a string. This gives readers more structured description of what's in the prompt.
Simply following this pattern for Relation/Task as well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense — maintaining a consistent catalogue pattern across assets/relations/tasks is a good approach for readability and structure. Thanks for the context!

return cls


def agent_ready(cls):
Copy link
Copy Markdown
Collaborator

@xyao-nv xyao-nv Jun 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Maybe out of the scope for this MR)
Apart from using decrator telling what agent to select from, is there any way at code level,
we can enforce agent that only uses the task args available in the constructor, instead of hallucinating from nowhere?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually doing this when building the task catalog (e..g. task name + required params name + docstring description of the task) which is passed to the agent as user prompt

https://github.com/isaac-sim/IsaacLab-Arena/pull/718/changes#diff-457c2de15d98c128d9d63a9715cfa841f9b53c531e6db83759373ef69fe7b861R184

I think that's probably the best we can do already pre-generation. We can of course do more post-validations after the agent returns, is that what you have in mind?

Comment thread isaaclab_arena/environments/arena_env_graph_types.py Outdated
Comment thread isaaclab_arena/environments/arena_env_graph_types.py Outdated
Comment thread isaaclab_arena/environments/arena_env_graph_types.py Outdated


# TODO(qianl): remove this enum and check against relation registry for task constraints
class ArenaEnvGraphTaskConstraintType(Enum):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove reach in this MR and bring it later. I am thinking if all REACH shall be derived in resolve_constraints.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MR simply moved ArenaEnvGraphTaskConstraintType around a bit so the classes are better grouped in this file. I don't think we should remove it in this MR, as the MR doesn't change anything regarding task constraints. May be better if you make the updates in your resolve constraint MR?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point — keeping it scoped to what this MR actually changes makes sense. I'll handle the removal in the resolve constraint MR. 👍

Comment thread setup.py Outdated
qianl-nv added 12 commits June 5, 2026 13:47
Item has no scale field; drop it from _MINIMAL_SPEC so the fixture matches the schema.

Signed-off-by: Qian Lin <qianl@nvidia.com>
Raise RuntimeError when the endpoint returns HTTP 200 with no choices, matching ping() and avoiding IndexError on content-filter paths.

Signed-off-by: Qian Lin <qianl@nvidia.com>
Bring back the CLI-override note and at_pose TODO dropped during the flat schema refactor.

Signed-off-by: Qian Lin <qianl@nvidia.com>
Pin the runtime dependency to the v2 SDK line used by EnvironmentGenerationAgent's json_schema response_format.

Signed-off-by: Qian Lin <qianl@nvidia.com>
Align ping and generate_spec error paths with repo conventions; update tests to expect AssertionError.

Signed-off-by: Qian Lin <qianl@nvidia.com>
Require [] for static scenes and only include tasks the user prompt explicitly requests.

Signed-off-by: Qian Lin <qianl@nvidia.com>
…tionSpec.

Align graph state relation type naming with SpatialRelationSpec.

Signed-off-by: Qian Lin <qianl@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants