Optimize make_geocube() when converting a GeoDataFrame with multiple columns

Hello there, I stumbled on this library from a stackoverflow answer you posted and I have to say I really dig it.

I'm working on a project where I have a GDF that needs to get rasterized, and this turned out to be the perfect solution. However, the GDF's that will be coming through the pipeline will be kind of big (150K rows x 700 columns) . Right now the rasterization part is becoming a bottleneck, it takes a little over an hour while the other operations happen in minutes. We can cut down the resolution of some of this data on our end, but it seems like there could be some room to optimize the function. 

For example, one column with the 150K shapely Point features rasterizes in about 5 seconds using the 'nearest' interpolation method. I believe it should be possible to run the 5 second algorithm that aligns the outgoing grid with the 'nearest' vector features just once, and then simply apply the same pattern across the other n columns of data, so as n scales up, the time to execute the function doesn't scale up with it. 

For the 'nearest' option, imagine rasterizing a numbered index associated with the geometry features, and then simply mapping the remaining columns to the new pattern (something akin to pandas take(). I'm not sure what's going on exactly under the hood, but I imagine something analogous could be done for the other interpolation options. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize make_geocube() when converting a GeoDataFrame with multiple columns #56

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimize make_geocube() when converting a GeoDataFrame with multiple columns #56

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions