Skip to content

feat: add N-D raster dimension query and manipulation functions#750

Draft
james-willis wants to merge 18 commits intoapache:mainfrom
james-willis:jw/nd-raster-functions
Draft

feat: add N-D raster dimension query and manipulation functions#750
james-willis wants to merge 18 commits intoapache:mainfrom
james-willis:jw/nd-raster-functions

Conversation

@james-willis
Copy link
Copy Markdown
Contributor

Summary

Adds 8 new RS_* functions for querying and manipulating N-dimensional raster data. Depends on #749.

Dimension query functions (rs_dimensions.rs)

  • RS_NumDimensions(raster [, band]) → Int32 — number of dimensions
  • RS_DimNames(raster [, band]) → List<Utf8> — ordered dimension names
  • RS_DimSize(raster, dim_name [, band]) → Int64 — size of a named dimension (null if missing)
  • RS_Shape(raster [, band]) → List<Int64> — full shape array

When the band argument is omitted, defaults to band 0 and verifies all bands agree — returns an error if bands have different dimensionality.

Slice functions (rs_slice.rs)

  • RS_Slice(raster, dim_name, index) → Raster — reduce a dimension by picking one index
  • RS_SliceRange(raster, dim_name, start, end) → Raster — narrow a dimension to [start, end)

Dimension ↔ band functions (rs_dim_band.rs)

  • RS_DimToBand(raster, dim_name) → Raster — promote a dimension into separate bands
  • RS_BandToDim(raster, dim_name) → Raster — collapse bands into a new dimension

All slice/dim functions error on spatial dimensions (x_dim/y_dim). Phase 1 always materializes data as contiguous copies.

Test plan

  • 31 new tests across 3 files (19 dimension queries + 12 slice/dim-band)
  • All 174 tests pass (143 existing + 31 new)
  • cargo clippy — zero warnings
  • pre-commit — all hooks pass
  • Round-trip test: RS_DimToBand then RS_BandToDim recovers original data

Replace the legacy 2D raster schema with the N-dimensional layout:

- Remove metadata sub-struct (width/height/upperleft/scale/skew)
- Add transform: List<Float64> (6-element GDAL GeoTransform)
- Add x_dim/y_dim: Utf8 for explicit spatial dimension declaration
- Flatten band struct: name, dim_names, shape, data_type, nodata,
  strides, offset, outdb_uri, data
- Remove StorageType enum (OutDb indicated by non-null outdb_uri)
- Remove metadata_indices/band_metadata_indices modules
- Update raster_indices and band_indices for new layout

Downstream crates will not compile until subsequent commits
update them to use the new schema.
Replace the legacy trait hierarchy with N-D raster types:

- traits.rs: RasterRef (transform, x_dim, y_dim, width/height
  derived from band dims), BandRef (ndim, dim_names, shape,
  nd_buffer, contiguous_data returning Cow), NdBuffer struct
- array.rs: RasterStructArray reads new schema (transform list,
  x_dim/y_dim, flattened band fields with nested lists for
  dim_names/shape/strides), RasterRefImpl with band_boxed()
- builder.rs: start_raster/start_band with N-D params, plus
  start_raster_2d/start_band_2d convenience for legacy 2D usage
- affine_transformation.rs: AffineMatrix::from_transform(&[f64])
  replaces from_metadata(), free functions accept &dyn RasterRef
- display.rs: updated for new trait interface

Downstream crates will not compile until test utilities and
RS_* functions are updated in subsequent commits.
… test utils

Update sedona-raster core types and sedona-testing helpers for the
N-D raster schema:

sedona-raster:
- traits.rs: RasterRef with transform/x_dim/y_dim/width/height,
  BandRef with ndim/dim_names/shape/nd_buffer/contiguous_data(Cow),
  NdBuffer struct. band() returns Box<dyn BandRef>.
- array.rs: RasterStructArray reads new flattened schema with
  nested lists for dim_names/shape/strides
- builder.rs: start_raster/start_band with N-D params, plus
  start_raster_2d/start_band_2d convenience methods
- affine_transformation.rs: from_transform(&[f64]) replaces
  from_metadata(), free functions accept &dyn RasterRef
- display.rs: updated for new trait interface

sedona-testing:
- All raster helpers updated to use new builder API
- assert_raster_equal compares transforms, dims, shapes, data
- generate_multi_band_raster uses start_band_2d

Also fixes x_dim/y_dim schema to use Utf8View (matching builder).

sedona-raster-functions not yet updated — next commit.
Mechanically migrate all 14 RS_* function files from the legacy
raster API to the new N-D trait interface:

- raster.metadata().width/height → raster.width/height().unwrap()
- raster.metadata().upper_left_x/scale_x/etc → raster.transform()[i]
- raster.bands().len/band(n) → raster.num_bands/band(n-1)
- band.metadata().data_type/nodata/storage_type → band.data_type/nodata/outdb_uri
- band.data() → band.contiguous_data()
- AffineMatrix::from_metadata → from_transform
- Remove StorageType, RasterMetadata, BandMetadata imports
- Update all test helpers to use start_raster_2d/start_band_2d

All 140 existing tests pass with identical outputs.
Add 7 tests covering new N-dimensional raster capabilities:

- Non-standard spatial dim names (lon/lat): width()/height() work
- Mixed dimensionality: 3D + 2D bands in same raster
- dim_index()/dim_size() lookups including missing dims
- contiguous_data() returns Cow::Borrowed in Phase 1
- NdBuffer strides correct for UInt8/Float64/UInt16 at various shapes
- width()/height() returns None for raster with no bands
- Band name nullable: named vs unnamed bands, out-of-range
Move the u32-to-BandDataType conversion from inline match in the
array reader to BandDataType::try_from_u32() on the enum itself.
Eliminates duplicated mapping logic.
- All 10 valid discriminants map correctly
- Invalid values (0, 11, u32::MAX) return None
- Round-trip: discriminant as u32 → try_from_u32 for all variants
- Fix needless borrows flagged by clippy
- Suppress too_many_arguments on start_raster_2d (intentional API)
- cargo fmt across all modified crates
Add outdb_uri parameter to RasterBuilder::start_band() so OutDb
bands can be constructed. Restore the RS_BandPath outdb tests
that were weakened during the migration — they now properly test
the Some(uri) code path again.
Update test outdb_uri values to follow the design convention:
geotiff://s3://bucket/file.tif#band=N
- Replace unwrap() on width()/height() with proper error handling
  in rs_convexhull, rs_envelope, rs_size, rs_spatial_predicates
- Remove dead band_name_array field from BandRefImpl
- Add debug_assert! bounds checks on AffineMatrix::from_transform
  and RasterRefImpl::transform()
- Add finish_band() validation: exactly one data value per band
- Add start_band_2d() validation: reject when width/height are 0
- Add band_by_name() default method on RasterRef for Zarr workflows
Add parse_outdb_uri() utility that splits scheme://path#fragment
into components. RS_BandPath now strips the internal scheme prefix
and fragment, returning just the path portion to users — matching
Sedona Spark's behavior where RS_BandPath returns a plain file path.

Test outdb_uri values now use scheme-dispatched format
(geotiff://s3://bucket/file.tif#band=1) while RS_BandPath output
remains s3://bucket/file.tif.
New dimension query functions for N-D rasters:

- RS_NumDimensions(raster [, band]) → Int32
- RS_DimNames(raster [, band]) → List<Utf8>
- RS_DimSize(raster, dim_name [, band]) → Int64 (null if dim missing)
- RS_Shape(raster [, band]) → List<Int64>

All accept an optional band index. When omitted, default to band 0
and verify all bands agree — error if bands have different
dimensionality, prompting user to specify a band index.

19 new tests covering 2D/3D rasters, explicit band args, null
handling, nonexistent dimensions, and mixed-dimensionality errors.
New N-D raster manipulation functions:

- RS_Slice(raster, dim_name, index) — reduce a dimension by picking
  one index, removing it from the output
- RS_SliceRange(raster, dim_name, start, end) — narrow a dimension
  to [start, end), keeping it with reduced size
- RS_DimToBand(raster, dim_name) — promote a dimension into separate
  bands (e.g., 1 band [time=3,y,x] → 3 bands [y,x])
- RS_BandToDim(raster, dim_name) — collapse all bands into one band
  with a new dimension (inverse of DimToBand)

All error on spatial dimension names (x_dim/y_dim). Phase 1 always
materializes data (contiguous copies). 12 new tests including
a DimToBand→BandToDim round-trip.
@github-actions github-actions bot requested a review from paleolimbot April 3, 2026 20:03
Add .qmd doc stubs for RS_NumDimensions, RS_DimNames, RS_DimSize,
RS_Shape, RS_Slice, RS_SliceRange, RS_DimToBand, RS_BandToDim.
Required by the docs-and-deploy CI check which validates every
registered function has a documentation page.
@james-willis james-willis force-pushed the jw/nd-raster-functions branch from 9e9a550 to 5a4c8ac Compare April 6, 2026 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant