feat: add N-D raster dimension query and manipulation functions#750
Draft
james-willis wants to merge 18 commits intoapache:mainfrom
Draft
feat: add N-D raster dimension query and manipulation functions#750james-willis wants to merge 18 commits intoapache:mainfrom
james-willis wants to merge 18 commits intoapache:mainfrom
Conversation
Replace the legacy 2D raster schema with the N-dimensional layout: - Remove metadata sub-struct (width/height/upperleft/scale/skew) - Add transform: List<Float64> (6-element GDAL GeoTransform) - Add x_dim/y_dim: Utf8 for explicit spatial dimension declaration - Flatten band struct: name, dim_names, shape, data_type, nodata, strides, offset, outdb_uri, data - Remove StorageType enum (OutDb indicated by non-null outdb_uri) - Remove metadata_indices/band_metadata_indices modules - Update raster_indices and band_indices for new layout Downstream crates will not compile until subsequent commits update them to use the new schema.
Replace the legacy trait hierarchy with N-D raster types: - traits.rs: RasterRef (transform, x_dim, y_dim, width/height derived from band dims), BandRef (ndim, dim_names, shape, nd_buffer, contiguous_data returning Cow), NdBuffer struct - array.rs: RasterStructArray reads new schema (transform list, x_dim/y_dim, flattened band fields with nested lists for dim_names/shape/strides), RasterRefImpl with band_boxed() - builder.rs: start_raster/start_band with N-D params, plus start_raster_2d/start_band_2d convenience for legacy 2D usage - affine_transformation.rs: AffineMatrix::from_transform(&[f64]) replaces from_metadata(), free functions accept &dyn RasterRef - display.rs: updated for new trait interface Downstream crates will not compile until test utilities and RS_* functions are updated in subsequent commits.
… test utils Update sedona-raster core types and sedona-testing helpers for the N-D raster schema: sedona-raster: - traits.rs: RasterRef with transform/x_dim/y_dim/width/height, BandRef with ndim/dim_names/shape/nd_buffer/contiguous_data(Cow), NdBuffer struct. band() returns Box<dyn BandRef>. - array.rs: RasterStructArray reads new flattened schema with nested lists for dim_names/shape/strides - builder.rs: start_raster/start_band with N-D params, plus start_raster_2d/start_band_2d convenience methods - affine_transformation.rs: from_transform(&[f64]) replaces from_metadata(), free functions accept &dyn RasterRef - display.rs: updated for new trait interface sedona-testing: - All raster helpers updated to use new builder API - assert_raster_equal compares transforms, dims, shapes, data - generate_multi_band_raster uses start_band_2d Also fixes x_dim/y_dim schema to use Utf8View (matching builder). sedona-raster-functions not yet updated — next commit.
Mechanically migrate all 14 RS_* function files from the legacy raster API to the new N-D trait interface: - raster.metadata().width/height → raster.width/height().unwrap() - raster.metadata().upper_left_x/scale_x/etc → raster.transform()[i] - raster.bands().len/band(n) → raster.num_bands/band(n-1) - band.metadata().data_type/nodata/storage_type → band.data_type/nodata/outdb_uri - band.data() → band.contiguous_data() - AffineMatrix::from_metadata → from_transform - Remove StorageType, RasterMetadata, BandMetadata imports - Update all test helpers to use start_raster_2d/start_band_2d All 140 existing tests pass with identical outputs.
Add 7 tests covering new N-dimensional raster capabilities: - Non-standard spatial dim names (lon/lat): width()/height() work - Mixed dimensionality: 3D + 2D bands in same raster - dim_index()/dim_size() lookups including missing dims - contiguous_data() returns Cow::Borrowed in Phase 1 - NdBuffer strides correct for UInt8/Float64/UInt16 at various shapes - width()/height() returns None for raster with no bands - Band name nullable: named vs unnamed bands, out-of-range
Move the u32-to-BandDataType conversion from inline match in the array reader to BandDataType::try_from_u32() on the enum itself. Eliminates duplicated mapping logic.
- All 10 valid discriminants map correctly - Invalid values (0, 11, u32::MAX) return None - Round-trip: discriminant as u32 → try_from_u32 for all variants
- Fix needless borrows flagged by clippy - Suppress too_many_arguments on start_raster_2d (intentional API) - cargo fmt across all modified crates
Add outdb_uri parameter to RasterBuilder::start_band() so OutDb bands can be constructed. Restore the RS_BandPath outdb tests that were weakened during the migration — they now properly test the Some(uri) code path again.
Update test outdb_uri values to follow the design convention: geotiff://s3://bucket/file.tif#band=N
- Replace unwrap() on width()/height() with proper error handling in rs_convexhull, rs_envelope, rs_size, rs_spatial_predicates - Remove dead band_name_array field from BandRefImpl - Add debug_assert! bounds checks on AffineMatrix::from_transform and RasterRefImpl::transform() - Add finish_band() validation: exactly one data value per band - Add start_band_2d() validation: reject when width/height are 0 - Add band_by_name() default method on RasterRef for Zarr workflows
Add parse_outdb_uri() utility that splits scheme://path#fragment into components. RS_BandPath now strips the internal scheme prefix and fragment, returning just the path portion to users — matching Sedona Spark's behavior where RS_BandPath returns a plain file path. Test outdb_uri values now use scheme-dispatched format (geotiff://s3://bucket/file.tif#band=1) while RS_BandPath output remains s3://bucket/file.tif.
New dimension query functions for N-D rasters: - RS_NumDimensions(raster [, band]) → Int32 - RS_DimNames(raster [, band]) → List<Utf8> - RS_DimSize(raster, dim_name [, band]) → Int64 (null if dim missing) - RS_Shape(raster [, band]) → List<Int64> All accept an optional band index. When omitted, default to band 0 and verify all bands agree — error if bands have different dimensionality, prompting user to specify a band index. 19 new tests covering 2D/3D rasters, explicit band args, null handling, nonexistent dimensions, and mixed-dimensionality errors.
New N-D raster manipulation functions: - RS_Slice(raster, dim_name, index) — reduce a dimension by picking one index, removing it from the output - RS_SliceRange(raster, dim_name, start, end) — narrow a dimension to [start, end), keeping it with reduced size - RS_DimToBand(raster, dim_name) — promote a dimension into separate bands (e.g., 1 band [time=3,y,x] → 3 bands [y,x]) - RS_BandToDim(raster, dim_name) — collapse all bands into one band with a new dimension (inverse of DimToBand) All error on spatial dimension names (x_dim/y_dim). Phase 1 always materializes data (contiguous copies). 12 new tests including a DimToBand→BandToDim round-trip.
Add .qmd doc stubs for RS_NumDimensions, RS_DimNames, RS_DimSize, RS_Shape, RS_Slice, RS_SliceRange, RS_DimToBand, RS_BandToDim. Required by the docs-and-deploy CI check which validates every registered function has a documentation page.
9e9a550 to
5a4c8ac
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds 8 new RS_* functions for querying and manipulating N-dimensional raster data. Depends on #749.
Dimension query functions (
rs_dimensions.rs)When the band argument is omitted, defaults to band 0 and verifies all bands agree — returns an error if bands have different dimensionality.
Slice functions (
rs_slice.rs)Dimension ↔ band functions (
rs_dim_band.rs)All slice/dim functions error on spatial dimensions (x_dim/y_dim). Phase 1 always materializes data as contiguous copies.
Test plan
cargo clippy— zero warningspre-commit— all hooks pass