marrow

An implementation of Apache Arrow in Mojo. The initial motivation was to learn Mojo while doing something useful, and since I've been involved in Apache Arrow for a while it seemed a natural fit. The project has grown beyond a prototype: it now has a full Python binding layer, SIMD compute kernels, GPU acceleration, and benchmarks showing it outperforms PyArrow on array construction for common numeric and string workloads.

What is Arrow?

Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized, language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs.

What is Mojo?

Mojo is a new programming language built on MLIR that combines Python expressiveness with the performance of systems programming languages.

Why Arrow in Mojo?

Arrow should be a first-class citizen in Mojo's ecosystem. This implementation provides zero-copy interoperability with PyArrow via the Arrow C Data Interface, and serves as a foundation for high-performance data processing in Mojo.

Features

Array types

PrimitiveArray[T] — numeric and boolean arrays with type aliases: BoolArray, Int8Array … Int64Array, UInt8Array … UInt64Array, Float32Array, Float64Array
StringArray — UTF-8 variable-length strings
ListArray — variable-length nested arrays
FixedSizeListArray — fixed-size nested arrays (embedding vectors, coordinates)
StructArray — named-field structs
ChunkedArray — array split across multiple chunks
AnyArray — type-erased immutable array container (O(1) copy via ArcPointer)
RecordBatch — schema + column arrays, with slice, select, rename, add/remove/set column operations
Table — schema + chunked columns; from_batches(), to_batches(), combine_chunks()

Scalar types

PrimitiveScalar[T], StringScalar, ListScalar, StructScalar — typed scalars holding native values
AnyScalar — type-erased scalar backed by a length-1 AnyArray

Builders — incrementally build immutable arrays

PrimitiveBuilder[T], StringBuilder, ListBuilder, FixedSizeListBuilder, StructBuilder
AnyBuilder — type-erased builder using function-pointer vtable dispatch (O(1) copy via ArcPointer)

Compute kernels (SIMD-vectorized, null-aware)

Arithmetic: add, sub, mul, div, floordiv, mod, neg, abs_, min_, max_
Math (unary): sign, sqrt, exp, exp2, log, log2, log10, log1p, floor, ceil, trunc, round, sin, cos
Math (binary): pow_
Comparisons: equal, not_equal, less, less_equal, greater, greater_equal → BoolArray (CPU + GPU)
Aggregates: sum_, product, min_, max_, any_, all_ (null-skipping)
Group-by: groupby(keys, values, aggregations) — fused hash+aggregate, returns RecordBatch
Hashing: hash_ for primitive, string, and struct arrays
Selection: filter_, drop_nulls
Strings: string_lengths
Similarity: cosine_similarity (batch N-vectors vs 1 query, CPU SIMD + GPU)

Expression execution (marrow/expr)

Build lazy expression trees with col(), lit(), if_else() and operator overloads (+, -, *, /, >, <, ==, &, |, …)
Relational plan nodes: InMemoryTable, Filter, Project, ParquetScan, Aggregate with .filter(), .select(), .aggregate() chaining
Pull-based streaming executor: Planner compiles a plan into typed processor trees; execute() collects RecordBatch results

Parquet I/O (marrow/parquet)

read_table(path) — read a Parquet file into a marrow Table
write_table(table, path) — write a marrow Table to Parquet

Python bindings — import marrow as ma

array(values, type=None) — create any array type from Python lists with type inference
All compute kernels exposed as free functions
Full null handling, type coercion, nested structure support

Interoperability

Arrow C Data Interface — zero-copy exchange with PyArrow
GPU acceleration via Mojo's DeviceContext (Metal on Apple Silicon, CUDA on NVIDIA)

Python Quick Start

pixi run -e dev build_python   # compile marrow.so

import marrow as ma

# ── Array construction ────────────────────────────────────────────────────────

# Primitive arrays — type inference
a = ma.array([1, 2, 3, None, 5])           # int64 with one null
f = ma.array([1.0, 2.5, None, 4.0])        # float64

# Explicit types
a = ma.array([1, 2, 3, None, 5], type=ma.int64())

# Strings
s = ma.array(["hello", None, "world"])

# Nested lists
nested = ma.array([[1, 2], [3, 4, 5], None])

# Struct arrays — automatic type inference from dict keys
structs = ma.array([{"x": 1, "y": 1.5}, {"x": 2, "y": 2.5}])

# With explicit schema
t = ma.struct([ma.field("x", ma.int64()), ma.field("y", ma.float64())])
structs = ma.array([{"x": 1, "y": 1.5}, {"x": 2, "y": 2.5}], type=t)

# ── Arithmetic (null-propagating) ─────────────────────────────────────────────

b = ma.array([10, 20, 30, None, 50])
result = ma.add(a, b)      # null where either input is null
result = ma.sub(a, b)
result = ma.mul(a, b)
result = ma.div(a, b)

# ── Aggregates (null-skipping) ────────────────────────────────────────────────

ma.sum_(a)       # → 11.0  (skips the null at index 3)
ma.product(a)    # → 30.0
ma.min_(a)       # → 1.0
ma.max_(a)       # → 5.0
ma.any_(ma.array([False, True, None]))   # → True
ma.all_(ma.array([True, True, None]))   # → True

# ── Selection ─────────────────────────────────────────────────────────────────

mask = ma.array([True, False, True, False, True])
ma.filter_(a, mask)    # [1, 3, 5]
ma.drop_nulls(a)       # [1, 2, 3, 5]  (removes index 3)

# ── Array methods ─────────────────────────────────────────────────────────────

len(a)             # 5
a.null_count()     # 1
a.type()           # int64
a.slice(1, 3)      # [2, 3, None]  — zero-copy
a[0]               # 1
str(a)             # "Int64Array([1, 2, 3, NULL, 5])"

# Struct field access
structs.field(0)           # Int64Array — field "x"
structs.field("y")         # Float64Array — field "y"

Mojo API

Creating arrays

from marrow.arrays import array, PrimitiveArray, StringArray, BoolArray
from marrow.dtypes import int8, int32, int64, bool_, list_

# Factory function — list of optionals
var a = array[int32]([1, 2, 3, 4, 5])
var b = array[int64]([1, None, 3, None, 5])   # nulls at index 1 and 3
var c = array[bool_]([True, False, True])

Builders

from marrow.builders import PrimitiveBuilder, StringBuilder, ListBuilder

# Primitive
var pb = PrimitiveBuilder[int64](capacity=4)
pb.append(10)
pb.append(20)
pb.append_null()
pb.append(40)
var arr: Int64Array = pb.finish()

# String
var sb = StringBuilder()
sb.append("hello")
sb.append_null()
sb.append("world")
var strs: StringArray = sb.finish()

# List of int32 — append child elements, then commit each list element
var child = PrimitiveBuilder[int32]()
child.append(1)
child.append(2)
var lb = ListBuilder(child^)       # moves child into the builder
lb.append(True)                    # [1, 2] is the first list element
lb.values().append(3)              # child element for the next list
lb.append(True)                    # [3] is the second list element
lb.append_null()                   # null third element
var lists: ListArray = lb.finish()

Display

All arrays implement Writable so they print directly:

print(arr)    # Int64Array([10, 20, NULL, 40])
print(strs)   # StringArray([hello, NULL, world])

Compute kernels

from marrow.kernels.arithmetic import add, sub, mul, div, sqrt, log, sin
from marrow.kernels.aggregate import sum_, min_, max_, any_, all_
from marrow.kernels.filter import filter_, drop_nulls
from marrow.kernels.compare import equal, less, greater_equal
from marrow.kernels.groupby import groupby

var x = array[int64]([1, 2, 3, 4])
var y = array[int64]([10, 20, 30, 40])

var z = add(x, y)               # Int64Array([11, 22, 33, 44])
var total = sum_[int64](x)      # 10
var filtered = filter_[int64](x, array[bool_]([True, False, True, False]))

var a = array[int64]([1, 2, 3, 4])
var b = array[int64]([1, 3, 2, 4])
var eq = equal(a, b)            # BoolArray([true, false, false, true])
var lt = less(a, b)             # BoolArray([false, true, false, false])

# Unary math (floating-point)
var f = array[float64]([1.0, 4.0, 9.0, 16.0])
var s = sqrt(f)                 # Float64Array([1.0, 2.0, 3.0, 4.0])
var l = log(f)                  # natural log

# Group-by
var keys = array[int64]([1, 2, 1, 2, 1])
var vals = array[float64]([10.0, 20.0, 30.0, 40.0, 50.0])
var result = groupby(keys, [vals], ["sum"])   # RecordBatch: key=[1,2], sum=[90.0, 60.0]

Expression execution

from marrow.expr import col, lit, in_memory_table, execute, ExecutionContext
from marrow.tabular import record_batch

var batch = record_batch(
    [array[int64]([25, 35, 45]), array[String](["Alice", "Bob", "Carol"])],
    names=["age", "name"],
)
var plan = in_memory_table(batch)
    .filter(col("age") > lit(30))
    .select(col("name"), col("age"))

var ctx = ExecutionContext()
var results = execute(plan, ctx)   # List[RecordBatch]

Parquet I/O

from marrow.parquet import read_table, write_table

var tbl = read_table("data.parquet")
write_table(tbl, "output.parquet")

Zero-copy PyArrow interop (C Data Interface)

from std.python import Python
from marrow.c_data import CArrowArray, CArrowSchema

var pa = Python.import_module("pyarrow")
var pyarr = pa.array([1, 2, 3, 4, 5], mask=[False, False, False, False, True])

var capsules = pyarr.__arrow_c_array__()
var dtype = CArrowSchema.from_pycapsule(capsules[0]).to_dtype()  # int64
var data = CArrowArray.from_pycapsule(capsules[1])^.to_array(dtype)
var typed = data.as_int64()

print(typed.is_valid(0))   # True
print(typed.is_valid(4))   # False  (null)
print(typed.unsafe_get(0)) # 1

Benchmarks

Python array construction vs PyArrow (n=100,000 elements, Apple M-series, mean time):

Array type	marrow	PyArrow	speedup
int64 (explicit type)	0.30 ms	0.92 ms	3.0x faster
int64 + nulls (explicit)	0.30 ms	0.91 ms	3.0x faster
float64 (explicit)	0.28 ms	0.48 ms	1.7x faster
float64 + nulls	0.28 ms	0.52 ms	1.8x faster
string (explicit)	0.81 ms	1.07 ms	1.3x faster
string + nulls	0.80 ms	1.04 ms	1.3x faster
struct, primitive fields	4.64 ms	6.35 ms	1.4x faster
int64 (inferred)	1.58 ms	1.28 ms	1.2x slower
string (inferred)	0.92 ms	1.01 ms	~parity
nested list (inferred)	0.61 ms	2.37 ms	3.9x faster

When the array type is provided explicitly, marrow's builder path is faster than PyArrow's for numeric and string types. Type inference involves a Python-side scan to detect the type, which adds overhead; this gap will narrow as the inference path is optimized.

Run the benchmarks yourself:

pixi run -e bench bench_python       # Python array construction vs PyArrow
pixi run -e bench bench              # CPU SIMD arithmetic benchmarks
pixi run -e bench bench_similarity   # cosine similarity: CPU vs GPU

# Side-by-side comparison table: marrow vs polars vs pyarrow vs duckdb
pixi run -e bench pytest --benchmark --no-mojo python/tests/bench_compute.py --competition
pixi run -e bench pytest --benchmark --no-mojo python/tests/bench_join.py --competition

GPU Acceleration

GPU kernels are available for compute-intensive operations when a DeviceContext is provided. Benchmarked on Apple Silicon (M-series, Metal, unified memory):

Cosine similarity (batch N-vectors vs 1 query, dim=768):

Vectors	CPU SIMD	GPU (upload per call)	GPU (pre-loaded)
10 K	baseline	2–3x slower	~1x (crossover)
100 K	baseline	~1x	~3x faster
500 K	baseline	—	~13x faster

The key pattern: upload data to the GPU once, run multiple kernels, download results at the end. The crossover vs CPU SIMD is around 10K vectors at dim≥384.

Element-wise arithmetic (add, mul, etc.) is faster on CPU SIMD — data transfer overhead dominates for low arithmetic-intensity operations.

from std.gpu.host import DeviceContext
from marrow.kernels.similarity import cosine_similarity

# Pre-load data onto the GPU once
var ctx = DeviceContext()
var vectors_gpu = vectors.to_device(ctx)
var query_gpu = query.to_device(ctx)

# Run many similarity searches without re-uploading
var scores = cosine_similarity(vectors_gpu, query_gpu, ctx)

Known Limitations

C Data Interface: Release callbacks are not invoked (Mojo cannot pass a callback to a C function yet). Consuming Arrow data from PyArrow works; producing data back to PyArrow via the release mechanism is not fully implemented.
Testing: Conformance against the Arrow specification is verified through PyArrow since Mojo has no JSON library yet. Full integration testing requires a Mojo JSON reader.
Type coverage: Only boolean, numeric, string, list, fixed-size list, and struct types are implemented. Date/time, dictionary, union, decimal, and binary types are not yet supported.
Parquet I/O: Parquet support currently bridges through PyArrow. Native Mojo Parquet reading is planned for a future release.
GPU null handling: Binary arithmetic kernels on the GPU do not propagate null bitmaps (GPU bitmap_and is not yet implemented). Null-aware GPU arithmetic is CPU-only for now.

Development

Install pixi. The project uses pixi environments to keep optional dependencies out of the default install:

Environment	Activate with	What it includes
`dev`	`-e dev`	pyarrow, pytest, ruff — daily dev and testing
`asan`	`-e asan`	dev + `libcompiler-rt` for AddressSanitizer runs
`bench`	`-e bench`	dev + polars, duckdb, rich for comparison benchmarks
`format`	`-e format`	ruff only
`docs`	`-e docs`	jupyter, quarto

# testing
pixi run -e dev test              # all tests (Mojo + Python)
pixi run -e dev test_mojo         # Mojo unit tests only
pixi run -e dev test_python       # Python binding tests only

# benchmarks
pixi run -e bench bench           # all benchmarks
pixi run -e bench bench_mojo      # Mojo benchmarks only
pixi run -e bench bench_python    # Python vs PyArrow benchmarks only

# formatting
pixi run -e dev fmt               # format all code (Mojo + Python)

# AddressSanitizer
pixi run -e asan test_mojo_asan   # Mojo tests under ASAN

The Python shared library (python/marrow.so) is built automatically before each test run — no manual build_python step required.

Running individual tests

Use pytest directly to run a single test file or a specific test case:

# entire file
pixi run -e dev pytest marrow/kernels/tests/test_join.mojo

# single test
pixi run -e dev pytest marrow/kernels/tests/test_join.mojo::test_collision_left_join

# verbose output
pixi run -e dev pytest -v marrow/tests/test_arrays.mojo

Pytest options

Option	Effect
`--mojo` / `--no-mojo`	Select or exclude Mojo tests
`--python` / `--no-python`	Select or exclude Python tests
`--gpu` / `--no-gpu`	Select or exclude GPU tests
`--benchmark`	Include benchmark files (`bench_.mojo` / `bench_.py`); also switches to `-O3`
`--asan`	Enable AddressSanitizer (use `-e asan` environment)
`--competition`	After benchmarks, print a side-by-side comparison table across all measured libraries

Writing Mojo tests

Test files (test_*.mojo) use TestSuite from marrow.testing:

from marrow.testing import TestSuite

def test_something() raises:
    assert_true(1 + 1 == 2)

def main():
    TestSuite.run[__functions_in_module()]()

TestSuite.run auto-discovers every test_* function in the module. No registration needed — just name the function with the test_ prefix.

Writing Mojo benchmarks

Benchmark files (bench_*.mojo) use BenchSuite and Benchmark from marrow.testing:

from marrow.testing import BenchSuite, Benchmark, BenchMetric

def bench_my_kernel(mut b: Benchmark) raises:
    var data = _prepare_data(N)
    b.throughput(BenchMetric.elements, N)
    @always_inline
    @parameter
    def call():
        keep(my_kernel(data))
    b.iter[call]()

def main():
    BenchSuite.run[__functions_in_module()]()

BenchSuite.run auto-discovers every bench_* function. For multiple sizes, define a shared helper and one thin wrapper per size:

def _bench_kernel(mut b: Benchmark, n: Int) raises:
    ...

def bench_kernel_10k(mut b: Benchmark) raises: _bench_kernel(b, 10_000)
def bench_kernel_100k(mut b: Benchmark) raises: _bench_kernel(b, 100_000)
def bench_kernel_1m(mut b: Benchmark) raises: _bench_kernel(b, 1_000_000)

Build caching

The test harness compiles each Mojo test runner to a binary in .test_runners/ using mojo build. Runner files are named by a content hash of the selected tests, so the binary path is stable across runs with the same test selection. On the second run mojo build detects the existing binary and skips recompilation, reducing cold-start time from ~5 s to ~1 s. Up to 10 runner/binary pairs are kept; older ones are pruned automatically.

If the project matures, the goal is to contribute it upstream to the Apache Arrow project.

Common problems

If compilation fails on MacOS make sure you have the metal toolchain:

xcodebuild -downloadComponent MetalToolchain

Name		Name	Last commit message	Last commit date
Latest commit History 404 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
docs		docs
examples		examples
marrow		marrow
python		python
results		results
scripts		scripts
.commitlintrc.json		.commitlintrc.json
.dockerignore		.dockerignore
.envrc		.envrc
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Dockerfile.ci		Dockerfile.ci
LICENSE.txt		LICENSE.txt
README.md		README.md
conftest.py		conftest.py
logo.png		logo.png
pixi.lock		pixi.lock
pixi.toml		pixi.toml
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

marrow

What is Arrow?

What is Mojo?

Why Arrow in Mojo?

Features

Python Quick Start

Mojo API

Creating arrays

Builders

Display

Compute kernels

Expression execution

Parquet I/O

Zero-copy PyArrow interop (C Data Interface)

Benchmarks

GPU Acceleration

Known Limitations

Development

Running individual tests

Pytest options

Writing Mojo tests

Writing Mojo benchmarks

Build caching

Common problems

References

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

marrow

What is Arrow?

What is Mojo?

Why Arrow in Mojo?

Features

Python Quick Start

Mojo API

Creating arrays

Builders

Display

Compute kernels

Expression execution

Parquet I/O

Zero-copy PyArrow interop (C Data Interface)

Benchmarks

GPU Acceleration

Known Limitations

Development

Running individual tests

Pytest options

Writing Mojo tests

Writing Mojo benchmarks

Build caching

Common problems

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages