Skip to content

Add Layout Validation#752

Open
zhx06 wants to merge 5 commits into
mainfrom
zxiao/feature/layout_validation
Open

Add Layout Validation#752
zhx06 wants to merge 5 commits into
mainfrom
zxiao/feature/layout_validation

Conversation

@zhx06
Copy link
Copy Markdown
Collaborator

@zhx06 zhx06 commented Jun 2, 2026

Summary

Add layout validation with save/load storage

Detailed description

  • Add structured ValidationReport, frozen and fails-closed, through PlacementResult, ranking, and fallback paths.
  • Unify acceptance, selection, and diagnostics on an injectable layout_filter; track per-layout use_count.
  • Add PooledObjectPlacer save/load to reuse solved poses, persisting placement_seed and had_fallbacks for reproducible sampling.
  • Isolate the JSON schema in layout_pool_serialization with atomic, fail-loud writes and full load-time guards.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jun 2, 2026

Greptile Summary

This PR adds structured layout validation via an immutable ValidationReport that replaces the previous bool success flag, an injectable LayoutFilter predicate that unifies acceptance across ranking, storage, and fallback paths, and save/load persistence for PooledObjectPlacer that serializes solved poses to JSON for reuse without re-solving.

  • ValidationReport is frozen and fails-closed (empty → not passed), with a with_check extension API for post-pool simulation checks; PlacementResult.success is now a derived property.
  • PooledLayout wraps each stored result with a use_count counter tracked across both consuming and non-consuming draws; _store_env_matched_results now correctly guards against setting _had_fallbacks when env_results is empty.
  • layout_pool_serialization isolates the JSON schema with atomic fail-loud writes and full structural load-time validation; the assert-based guards in this module (and now also in ValidationReport) are stripped by python -O, a concern already noted in the prior review thread for the serialization module.

Confidence Score: 5/5

The change is safe to merge. Core ranking, fallback, and persistence logic is correct; the only open items are the assert-stripping pattern already flagged in the prior review thread.

The refactoring is thorough and internally consistent. The bug fix in _store_env_matched_results (guarding against empty env_results before setting _had_fallbacks) is correct. The save/load round-trip is well-tested with seed restoration, consumed-layout preservation, heterogeneity mismatch detection, and object-set validation. No logic path leaves objects silently mis-placed or the pool silently empty.

The assert-based validation in placement_result.py (ValidationReport.__post_init__, with_check) and the return-type guard in pooled_object_placer.py (accepts()) follow the same pattern already flagged in layout_pool_serialization.py — all three files warrant the same fix if the team decides to address the optimised-build concern.

Important Files Changed

Filename Overview
isaaclab_arena/relations/placement_result.py Introduces ValidationReport (frozen, immutable checks map), LayoutFilter type alias, and default_layout_filter; replaces PlacementResult.success bool field with a derived property. Core design is sound but assert-based bool enforcement is stripped in optimised builds.
isaaclab_arena/relations/layout_pool_serialization.py New module owning the JSON schema for pool persistence. Atomic write with OSError cleanup and NaN guard are correct; assert-based structural checks (already flagged in prior thread) remain the open concern in debug vs. optimised builds.
isaaclab_arena/relations/pooled_object_placer.py Adds PooledLayout wrapper with use_count, injectable layout_filter, accepts(), _summarize_rejections, stored_layouts, and save/load. Bug fix: _store_env_matched_results now guards against empty env_results before setting _had_fallbacks; accepts() guard is assert-based and stripped in optimised builds.
isaaclab_arena/relations/object_placer.py Renames _validate_placement_validate_geometry (now returns ValidationReport), adds injectable layout_filter parameter, and changes _rank_candidates from a @staticmethod to an instance method using self._accepts. All call sites updated correctly.
isaaclab_arena/relations/placement_events.py Switches the fallback-detection condition from result.success to placement_pool.accepts(result), keeping acceptance logic consistent with the pool's filter. All mock pools in tests are updated to expose accepts().
isaaclab_arena/tests/test_layout_pool_serialization.py New test file covering round-trips, structural guards, NaN rejection, orphan-tmp cleanup, and all serialization edge cases with good parametrised coverage.
isaaclab_arena/tests/test_object_placer_reproducibility.py Extensive save/load round-trip tests, use_count tracking, stored_layouts post-validation flow, snapshot isolation from refills, and seed-restore coverage added.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant PooledObjectPlacer
    participant ObjectPlacer
    participant ValidationReport
    participant Serialization

    Caller->>PooledObjectPlacer: __init__(objects, placer_params, layout_filter)
    PooledObjectPlacer->>ObjectPlacer: ObjectPlacer(params, layout_filter)
    PooledObjectPlacer->>PooledObjectPlacer: _solve_and_store(pool_size)
    ObjectPlacer->>ObjectPlacer: _validate_geometry(positions, bboxes)
    ObjectPlacer-->>ValidationReport: "ValidationReport(checks={no_overlap, on_relations})"
    ObjectPlacer->>ObjectPlacer: _rank_candidates(self._accepts)
    ObjectPlacer-->>PooledObjectPlacer: "PlacementResult(validation=report, ...)"
    PooledObjectPlacer->>PooledObjectPlacer: accepts(result) via layout_filter(report)
    PooledObjectPlacer->>PooledObjectPlacer: "store as PooledLayout(result, use_count=0)"

    Caller->>PooledObjectPlacer: save(path)
    PooledObjectPlacer->>Serialization: write_pool_document(path, PoolDocument)
    Serialization-->>Caller: JSON file (atomic write)

    Caller->>PooledObjectPlacer: load(path, objects, params)
    PooledObjectPlacer->>Serialization: read_pool_document(path)
    Serialization-->>PooledObjectPlacer: PoolDocument (validated)
    PooledObjectPlacer->>Serialization: deserialize_layout(data, name_to_obj)
    Serialization-->>PooledObjectPlacer: PlacementResult (reconstructed)
    PooledObjectPlacer-->>Caller: PooledObjectPlacer (loaded, seed restored)

    Caller->>PooledObjectPlacer: sample_with_replacement(count)
    PooledObjectPlacer->>PooledObjectPlacer: _draw(rng, pooled_layouts) mark_used()
    PooledObjectPlacer-->>Caller: list[PlacementResult]
Loading

Reviews (4): Last reviewed commit: "address comments" | Re-trigger Greptile

Comment thread isaaclab_arena/relations/layout_pool_serialization.py
Comment thread isaaclab_arena/relations/layout_pool_serialization.py
Copy link
Copy Markdown
Contributor

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Add Layout Validation

Good feature addition. The ValidationReport / LayoutFilter abstraction cleanly separates geometry checks from acceptance policy, and the serialization layer is well-guarded. A few items worth addressing before merge:


1. assert used for runtime validation in serialization (Medium)

layout_pool_serialization.py uses assert statements for all structural validation of loaded files. In optimized Python (python -O), assertions are stripped entirely, meaning a corrupt file would pass silently and produce wrong poses.

Suggestion: Replace assert with explicit if … raise ValueError(…) (or a dedicated LayoutPoolError) for the user-facing validation in read_pool_document, deserialize_layout, and PoolDocument.from_dict. Keep assert only for internal invariants that should never fire.


2. write_pool_document doesn't fsync before os.replace (Low)

The atomic write pattern (write → os.replace) is correct for crash-consistency on most filesystems, but without os.fsync(fd) on the temp file (or os.fsync on the directory), data can be lost on power failure before the kernel flushes to disk.

Suggestion: If this is used in long-running sim training where a crash mid-write would lose hours of solved poses, consider adding an fsync call:

with open(tmp_path, "w") as f:
    f.write(payload)
    f.flush()
    os.fsync(f.fileno())
os.replace(tmp_path, path)

3. _draw marks use_count but sample_with_replacement doesn't record which slot consumed it (Low)

PooledLayout.use_count increments on every draw (both consuming and non-consuming), which is fine for observability. However, there's no way to distinguish a sample_with_replacement use from a sample_without_replacement / sample_for_envs use. If analytics or debugging later needs that distinction, consider tracking served_count vs consumed_count separately.


4. PooledObjectPlacer.load() silently ignores pool_size from caller (Low)

load() overrides pool_size with max(1, loaded_count). A caller passing pool_size=200 might expect the loaded pool to refill up to 200 on the next solve, but the loaded count becomes the refill batch size. This is defensible but worth documenting in the docstring.


5. deserialize_layout requires set(orientations) <= set(positions) but not equality (Nit)

Positions require an exact object-set match against name_to_obj, but orientations only require a subset of positions. This means an object without a saved yaw silently gets orientations = {} for that object on load. The asymmetry is intentional (some objects have no yaw), but a brief comment explaining why the subset check rather than equality would help future readers.


6. Missing __init__.py re-exports for new public API (Low)

ValidationReport, LayoutFilter, default_layout_filter, and the serialization helpers aren't visible from the module-level __init__.py (if one exists). If these are intended as public API for downstream consumers, consider re-exporting them.


7. Test helper _report duplicated across test files (Nit)

test_heterogeneous_placement.py defines a local _report(passed) helper. Consider moving it to tests/utils/placement.py (which this PR already introduces) to avoid future duplication.


Summary

The design is solid: ValidationReport is frozen and fails-closed, the injectable LayoutFilter keeps the placer and pool aligned on the same acceptance criterion, and the serialization is thorough with atomic writes and structural guards. The test coverage is excellent.

Verdict: Approve with minor suggestions — the assert-for-validation concern (item 1) is the most impactful; the rest are quality-of-life improvements.


📝 Update (52c035c)

Reviewed 6 new commits (c897caf2..52c035cc). Changes include NotNextTo relation/loss strategy, chunked eval dispatch, layout pool save/load, PooledLayout use_count tracking, and stored_layouts accessor.

  • Item 1 inline (orphan .tmp cleanup): Fixed — write_pool_document now has try/except with unlink(missing_ok=True).
  • Item 1 (assert-based validation): Still present — not addressed in these commits.
  • Items 2–7: Unchanged / not addressed (all low priority, fine to defer).

New code looks good overall:

  • NotNextTo loss strategy is well-designed (min-of-two-escape-routes avoids dead plateaus) and comprehensively tested with solver integration.
  • Layout pool save()/load() round-trip is thorough with structural guards and 20+ focused tests.
  • Chunked eval runner is pragmatic for OOM recovery; the TODO for metrics aggregation is appropriate.
  • The isaaclab_arena_envtask_description simplification is a clean decoupling.

No new blocking issues found. Previous verdict stands: Approve with minor suggestions (item 1 remains the only medium-priority concern).


📝 Update (3832af0)

Reviewed 1 new commit (52c035cc..3832af07). Minor wording refinement only:

  • Warning messages now say "did not meet the pool's acceptance criteria" instead of "failed strict placement validation" — clearer messaging that aligns with the LayoutFilter abstraction.
  • Test fixtures updated to match the ValidationReport-based API.

No new issues. Previous concerns remain unchanged. Verdict: Approve with minor suggestions.


Update (249ea78): Incremental review of changes since 3832af0.

New: Variation System (isaaclab_arena/variations/)

Clean, well-structured two-tier variation framework:

  • BuildTimeVariationBase — sampled once before env composition (HDR map selection)
  • RunTimeVariationBase — realized via EventTermCfg on each reset (camera extrinsics)

The sampler hierarchy (SamplerBaseContinuousSampler/ChoiceSampler) is minimal and extensible. The VariationBaseCfg.sampler_cfg.build() pattern keeps configuration declarative.

Camera extrinsics variation (camera_extrinsics_variation.py): Good design — snapshots nominal pose on first call, reapplies nominal + delta each reset to prevent drift accumulation. The ROS→OpenGL frame conversion is well-documented. Note the Isaac Lab quaternion bug is correctly handled and explicitly tested (test_isaaclab_bug_get_local_poses.py).

New: Layout Pool Serialization (layout_pool_serialization.py)

Solid defensive serialization with structural guards at every level:

  • PoolDocument.from_dict validates types, lengths, and key presence
  • deserialize_layout rejects non-finite values, wrong-length positions, empty validation maps
  • Atomic write via temp file + os.replace (previous fsync comment still applies)
  • Non-finite JSON constants rejected on read (NaN/Infinity literals)

Graph Spec: cli_override_specs + Pydantic Migration

  • cli_override_specs in YAML allows declarative swappable flags (e.g. --object, --embodiment) — nice ergonomic improvement for users running different configs from the same graph
  • at_poseat_position and inon aligns the spec closer to what the solver actually supports
  • Validation errors now surface as pydantic.ValidationError instead of raw AssertionError — cleaner error messages for malformed YAML

Test Coverage

Thorough test additions:

  • test_layout_pool_serialization.py — covers round trips, structural rejections, IO errors, leaf-level guards
  • test_build_time_variations.py, test_run_time_variations.py, test_camera_extrinsics_variation.py — end-to-end through ArenaEnvBuilder
  • test_variation_base.py — cfg plumbing, enable/disable, apply_cfg sampler rebuild
  • test_isaaclab_bug_get_local_poses.py — regression guard for the upstream quaternion layout issue
  • tests/utils/placement.py — shared layout_signature helper reduces duplication

Minor Observations

  1. pydantic>=2.0 in setup.py: This is a significant new dependency. Confirm it doesn't conflict with Isaac Lab's own Pydantic usage (Isaac Lab pins pydantic in some versions).

  2. ChoiceSampler.sample uses torch.randint: This bypasses the per-env RNG seeds (get_rngs). For the HDR variation (called once at build time), this is fine. If ever used in a per-env runtime path, it would break reproducibility. Worth a brief docstring note.

  3. SamplerBaseCfg.build() raises NotImplementedError: The base class's default cfg will error if someone passes a bare VariationBaseCfg() without overriding sampler_cfg. This is acceptable but could be friendlier with an early check in VariationBase.__init__.

Verdict

Approve — well-tested, well-structured addition. The variation system is clean and the serialization layer is defensively written. Previous review items (assert vs raise in serialization, fsync) still apply.

@zhx06 zhx06 force-pushed the zxiao/feature/layout_validation branch 2 times, most recently from 52c035c to 3832af0 Compare June 3, 2026 19:37
zhx06 added 5 commits June 4, 2026 12:45
Signed-off-by: zhx06 <zihaox@nvidia.com>
Signed-off-by: zhx06 <zihaox@nvidia.com>
Signed-off-by: zhx06 <zihaox@nvidia.com>
Signed-off-by: zhx06 <zihaox@nvidia.com>
@zhx06 zhx06 force-pushed the zxiao/feature/layout_validation branch from 3832af0 to 249ea78 Compare June 4, 2026 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant