Add Layout Validation#752
Conversation
Greptile SummaryThis PR adds structured layout validation via an immutable
Confidence Score: 5/5The change is safe to merge. Core ranking, fallback, and persistence logic is correct; the only open items are the assert-stripping pattern already flagged in the prior review thread. The refactoring is thorough and internally consistent. The bug fix in The assert-based validation in Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller
participant PooledObjectPlacer
participant ObjectPlacer
participant ValidationReport
participant Serialization
Caller->>PooledObjectPlacer: __init__(objects, placer_params, layout_filter)
PooledObjectPlacer->>ObjectPlacer: ObjectPlacer(params, layout_filter)
PooledObjectPlacer->>PooledObjectPlacer: _solve_and_store(pool_size)
ObjectPlacer->>ObjectPlacer: _validate_geometry(positions, bboxes)
ObjectPlacer-->>ValidationReport: "ValidationReport(checks={no_overlap, on_relations})"
ObjectPlacer->>ObjectPlacer: _rank_candidates(self._accepts)
ObjectPlacer-->>PooledObjectPlacer: "PlacementResult(validation=report, ...)"
PooledObjectPlacer->>PooledObjectPlacer: accepts(result) via layout_filter(report)
PooledObjectPlacer->>PooledObjectPlacer: "store as PooledLayout(result, use_count=0)"
Caller->>PooledObjectPlacer: save(path)
PooledObjectPlacer->>Serialization: write_pool_document(path, PoolDocument)
Serialization-->>Caller: JSON file (atomic write)
Caller->>PooledObjectPlacer: load(path, objects, params)
PooledObjectPlacer->>Serialization: read_pool_document(path)
Serialization-->>PooledObjectPlacer: PoolDocument (validated)
PooledObjectPlacer->>Serialization: deserialize_layout(data, name_to_obj)
Serialization-->>PooledObjectPlacer: PlacementResult (reconstructed)
PooledObjectPlacer-->>Caller: PooledObjectPlacer (loaded, seed restored)
Caller->>PooledObjectPlacer: sample_with_replacement(count)
PooledObjectPlacer->>PooledObjectPlacer: _draw(rng, pooled_layouts) mark_used()
PooledObjectPlacer-->>Caller: list[PlacementResult]
Reviews (4): Last reviewed commit: "address comments" | Re-trigger Greptile |
There was a problem hiding this comment.
Review: Add Layout Validation
Good feature addition. The ValidationReport / LayoutFilter abstraction cleanly separates geometry checks from acceptance policy, and the serialization layer is well-guarded. A few items worth addressing before merge:
1. assert used for runtime validation in serialization (Medium)
layout_pool_serialization.py uses assert statements for all structural validation of loaded files. In optimized Python (python -O), assertions are stripped entirely, meaning a corrupt file would pass silently and produce wrong poses.
Suggestion: Replace assert with explicit if … raise ValueError(…) (or a dedicated LayoutPoolError) for the user-facing validation in read_pool_document, deserialize_layout, and PoolDocument.from_dict. Keep assert only for internal invariants that should never fire.
2. write_pool_document doesn't fsync before os.replace (Low)
The atomic write pattern (write → os.replace) is correct for crash-consistency on most filesystems, but without os.fsync(fd) on the temp file (or os.fsync on the directory), data can be lost on power failure before the kernel flushes to disk.
Suggestion: If this is used in long-running sim training where a crash mid-write would lose hours of solved poses, consider adding an fsync call:
with open(tmp_path, "w") as f:
f.write(payload)
f.flush()
os.fsync(f.fileno())
os.replace(tmp_path, path)3. _draw marks use_count but sample_with_replacement doesn't record which slot consumed it (Low)
PooledLayout.use_count increments on every draw (both consuming and non-consuming), which is fine for observability. However, there's no way to distinguish a sample_with_replacement use from a sample_without_replacement / sample_for_envs use. If analytics or debugging later needs that distinction, consider tracking served_count vs consumed_count separately.
4. PooledObjectPlacer.load() silently ignores pool_size from caller (Low)
load() overrides pool_size with max(1, loaded_count). A caller passing pool_size=200 might expect the loaded pool to refill up to 200 on the next solve, but the loaded count becomes the refill batch size. This is defensible but worth documenting in the docstring.
5. deserialize_layout requires set(orientations) <= set(positions) but not equality (Nit)
Positions require an exact object-set match against name_to_obj, but orientations only require a subset of positions. This means an object without a saved yaw silently gets orientations = {} for that object on load. The asymmetry is intentional (some objects have no yaw), but a brief comment explaining why the subset check rather than equality would help future readers.
6. Missing __init__.py re-exports for new public API (Low)
ValidationReport, LayoutFilter, default_layout_filter, and the serialization helpers aren't visible from the module-level __init__.py (if one exists). If these are intended as public API for downstream consumers, consider re-exporting them.
7. Test helper _report duplicated across test files (Nit)
test_heterogeneous_placement.py defines a local _report(passed) helper. Consider moving it to tests/utils/placement.py (which this PR already introduces) to avoid future duplication.
Summary
The design is solid: ValidationReport is frozen and fails-closed, the injectable LayoutFilter keeps the placer and pool aligned on the same acceptance criterion, and the serialization is thorough with atomic writes and structural guards. The test coverage is excellent.
Verdict: Approve with minor suggestions — the assert-for-validation concern (item 1) is the most impactful; the rest are quality-of-life improvements.
📝 Update (52c035c)
Reviewed 6 new commits (c897caf2..52c035cc). Changes include NotNextTo relation/loss strategy, chunked eval dispatch, layout pool save/load, PooledLayout use_count tracking, and stored_layouts accessor.
- ✅ Item 1 inline (orphan
.tmpcleanup): Fixed —write_pool_documentnow hastry/exceptwithunlink(missing_ok=True). - Item 1 (assert-based validation): Still present — not addressed in these commits.
- Items 2–7: Unchanged / not addressed (all low priority, fine to defer).
New code looks good overall:
NotNextToloss strategy is well-designed (min-of-two-escape-routes avoids dead plateaus) and comprehensively tested with solver integration.- Layout pool
save()/load()round-trip is thorough with structural guards and 20+ focused tests. - Chunked eval runner is pragmatic for OOM recovery; the TODO for metrics aggregation is appropriate.
- The
isaaclab_arena_env→task_descriptionsimplification is a clean decoupling.
No new blocking issues found. Previous verdict stands: Approve with minor suggestions (item 1 remains the only medium-priority concern).
📝 Update (3832af0)
Reviewed 1 new commit (52c035cc..3832af07). Minor wording refinement only:
- Warning messages now say "did not meet the pool's acceptance criteria" instead of "failed strict placement validation" — clearer messaging that aligns with the
LayoutFilterabstraction. - Test fixtures updated to match the
ValidationReport-based API.
No new issues. Previous concerns remain unchanged. Verdict: Approve with minor suggestions.
Update (249ea78): Incremental review of changes since 3832af0.
New: Variation System (isaaclab_arena/variations/)
Clean, well-structured two-tier variation framework:
BuildTimeVariationBase— sampled once before env composition (HDR map selection)RunTimeVariationBase— realized viaEventTermCfgon each reset (camera extrinsics)
The sampler hierarchy (SamplerBase → ContinuousSampler/ChoiceSampler) is minimal and extensible. The VariationBaseCfg.sampler_cfg.build() pattern keeps configuration declarative.
Camera extrinsics variation (camera_extrinsics_variation.py): Good design — snapshots nominal pose on first call, reapplies nominal + delta each reset to prevent drift accumulation. The ROS→OpenGL frame conversion is well-documented. Note the Isaac Lab quaternion bug is correctly handled and explicitly tested (test_isaaclab_bug_get_local_poses.py).
New: Layout Pool Serialization (layout_pool_serialization.py)
Solid defensive serialization with structural guards at every level:
PoolDocument.from_dictvalidates types, lengths, and key presencedeserialize_layoutrejects non-finite values, wrong-length positions, empty validation maps- Atomic write via temp file +
os.replace(previousfsynccomment still applies) - Non-finite JSON constants rejected on read (
NaN/Infinityliterals)
Graph Spec: cli_override_specs + Pydantic Migration
cli_override_specsin YAML allows declarative swappable flags (e.g.--object,--embodiment) — nice ergonomic improvement for users running different configs from the same graphat_pose→at_positionandin→onaligns the spec closer to what the solver actually supports- Validation errors now surface as
pydantic.ValidationErrorinstead of rawAssertionError— cleaner error messages for malformed YAML
Test Coverage
Thorough test additions:
test_layout_pool_serialization.py— covers round trips, structural rejections, IO errors, leaf-level guardstest_build_time_variations.py,test_run_time_variations.py,test_camera_extrinsics_variation.py— end-to-end throughArenaEnvBuildertest_variation_base.py— cfg plumbing, enable/disable,apply_cfgsampler rebuildtest_isaaclab_bug_get_local_poses.py— regression guard for the upstream quaternion layout issuetests/utils/placement.py— sharedlayout_signaturehelper reduces duplication
Minor Observations
-
pydantic>=2.0insetup.py: This is a significant new dependency. Confirm it doesn't conflict with Isaac Lab's own Pydantic usage (Isaac Lab pinspydanticin some versions). -
ChoiceSampler.sampleusestorch.randint: This bypasses the per-env RNG seeds (get_rngs). For the HDR variation (called once at build time), this is fine. If ever used in a per-env runtime path, it would break reproducibility. Worth a brief docstring note. -
SamplerBaseCfg.build()raisesNotImplementedError: The base class's default cfg will error if someone passes a bareVariationBaseCfg()without overridingsampler_cfg. This is acceptable but could be friendlier with an early check inVariationBase.__init__.
Verdict
Approve — well-tested, well-structured addition. The variation system is clean and the serialization layer is defensively written. Previous review items (assert vs raise in serialization, fsync) still apply.
52c035c to
3832af0
Compare
Signed-off-by: zhx06 <zihaox@nvidia.com>
Signed-off-by: zhx06 <zihaox@nvidia.com>
Signed-off-by: zhx06 <zihaox@nvidia.com>
Signed-off-by: zhx06 <zihaox@nvidia.com>
3832af0 to
249ea78
Compare
Summary
Add layout validation with save/load storage
Detailed description
ValidationReport, frozen and fails-closed, throughPlacementResult, ranking, and fallback paths.layout_filter; track per-layoutuse_count.PooledObjectPlacersave/load to reuse solved poses, persistingplacement_seedandhad_fallbacksfor reproducible sampling.layout_pool_serializationwith atomic, fail-loud writes and full load-time guards.