[DISCUSSION] Usage of arrow-rs in other areas

> > In any case, at least at my company we probably have a few PiB of data written with this or an even earlier version.
>
> BTW this is so cool to hear

Really want to make sure I don't spam this PR with too much sideinfo, but I just wanted to use this opportunity to share that I (+ a coworker) will give the talk "Scaling Data Processing for Training Workloads at DeepL Research with Rust" at this year's PyCon DE / PyData in Darmstadt (Germany), where we go a bit into detail about this! 

Working with `arrow-rs` (+ `PyO3` as the Python binding layer) has been an absolute blast so far for coming up with a highly optimized and efficient deep learning data ingress pipeline. 

Especially compared to `pyarrow`, we've rarely or never seen any issues concerning surprisingly high resource usages, memory leaks or randomly not supported features (I'm somewhat sure selectively decoding specific rows by row index to reduce memory usage during sparse decoding isn't possible in a non-clunky way with `pyarrow`, and with `arrow-rs`'s `RowSelection` this was trivially easy, even as a feature exposed to Python). Happy to stay connected on this topic.

_Originally posted by @jonded94 in https://github.com/apache/arrow-rs/issues/9374#issuecomment-3909160504_
            

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DISCUSSION] Usage of arrow-rs in other areas #9423

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[DISCUSSION] Usage of arrow-rs in other areas #9423

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions