Native Apache Arrow for the BEAM: IPC streaming, Arrow Flight, and ADBC database bindings. Column data lives in Rust buffers; Elixir holds lightweight opaque handles. Precompiled NIFs for Linux, macOS, and Windows — no Rust required to use.
- Why ExArrow was built
- What it brings to the Elixir ecosystem
- How ExArrow differs from Explorer, Nx, ADBC, and ExZarr
- Where ExArrow fits
- What this enables
- Requirements
- Installation
- Quick start
- Livebook tutorials
- IPC: stream and file
- Arrow Flight: client and server
- ADBC: database to Arrow streams
- Using ExArrow with Explorer
- Use case examples
- Benchmarks
- Documentation
- Development
- Roadmap
- FAQ
- License
The Arrow ecosystem has become the de-facto interchange standard for columnar data. Python, R, Rust, Java, Go, and C++ all speak Arrow natively. Data warehouses, query engines, stream processors, ML frameworks, and databases expose Arrow Flight endpoints or ADBC interfaces. The BEAM had no first-class way to participate in this ecosystem.
ExArrow was written to fill that gap. It gives Elixir and Erlang applications the same low-level, zero-copy Arrow primitives that the rest of the ecosystem already takes for granted — without requiring callers to understand NIF memory management, dirty schedulers, or the Arrow C Data Interface.
The design goal is intentionally narrow: be the reliable Arrow transport and interchange layer for the BEAM, and let other libraries (Explorer, Nx, etc.) do the analysis on top of it.
Prior to ExArrow, an Elixir application that needed to exchange data with a Flight server, query a database via ADBC, or read/write an Arrow IPC file had three options: shell out to Python, implement the protocol manually in Elixir (row-by-row, with all the copying that entails), or simply not do it.
ExArrow adds:
- IPC reading and writing — Arrow stream and file formats, from binary or a file path, in both directions. Read a file produced by PyArrow, DuckDB, or Pandas; write a file for the same consumers. No format conversion needed.
- Arrow Flight client and server — Connect to Dremio, InfluxDB IOx, Snowflake Flight endpoints, or any custom Flight service. Run an in-process echo server for testing. Transfer Arrow streams over gRPC with one API call.
- ADBC database connectivity — Execute SQL against any ADBC-compatible database (SQLite, PostgreSQL, DuckDB, BigQuery, Snowflake, and more) and receive the results as a lazy Arrow stream — never materialising rows into BEAM terms unless you ask for them.
- Zero-copy streaming — Column buffers are allocated once in Rust and held there until consumed. The BEAM scheduler is never stalled on large copies. Dirty NIF schedulers are used for blocking I/O.
- A uniform stream abstraction —
ExArrow.Streamworks identically for IPC, Flight, and ADBC results. Code that processes batches does not know or care where the data came from.
These libraries are complementary, not competing. Each has a distinct role.
| Library | Role | Overlap with ExArrow |
|---|---|---|
| Explorer | In-memory dataframe analysis (filter, group, sort, plot). Backed by Polars/Arrow internally. | Explorer can load/dump Arrow IPC streams. ExArrow is the transport; Explorer is the analysis layer. |
| Nx | Numerical computing and tensor operations (multi-dimensional arrays, GPU support, ML). | Nx tensors and Arrow columns are both typed flat arrays. There is currently no direct bridge, but ExArrow IPC can produce data for downstream tensor conversion. |
| adbc (livebook-dev) | Elixir wrapper around the ADBC C library for driver management — downloading and configuring drivers. | ExArrow uses adbc optionally for driver download; adbc's core purpose is driver lifecycle, not Arrow streaming or Flight. |
| ExZarr | Read/write Zarr v2/v3 chunked array format (used in climate science, genomics, cloud-native ND arrays). | Zarr and Arrow are complementary storage formats. ExZarr addresses ND chunk storage; ExArrow addresses columnar interchange and network transport. |
In short: ExArrow is a transport and interchange library. It moves Arrow data between processes, databases, services, and files as efficiently as possible. It does not analyse, transform, or visualise data — that is the job of Explorer, Nx, or your own application logic.
flowchart TB
App("Your Elixir Application")
App --> Explorer("Explorer\ndataframes & analysis")
App --> Nx("Nx\ntensors & ML")
App --> ExArrow("ExArrow\nIPC · Flight · ADBC")
App --> ExZarr("ExZarr\nZarr chunked arrays")
ExArrow --> IPC("Arrow IPC\nstream & file")
ExArrow --> FlightSvr("Arrow Flight\ngRPC server")
ExArrow --> ADBCDrv("ADBC\ndriver")
IPC -. "interop via IPC binary" .-> Explorer
FlightSvr --> FlightSvcs("Dremio · InfluxDB IOx\nDuckDB · Snowflake")
ADBCDrv --> Databases("PostgreSQL · SQLite\nDuckDB · BigQuery")
classDef app fill:#1a1a2e,stroke:#4a90d9,color:#e0e0e0,rx:6
classDef lib fill:#16213e,stroke:#4a90d9,color:#e0e0e0,rx:6
classDef proto fill:#0f3460,stroke:#4a90d9,color:#e0e0e0,rx:6
classDef external fill:#1a1a2e,stroke:#888,color:#aaa,rx:6,stroke-dasharray:4 4
class App app
class Explorer,Nx,ExArrow,ExZarr lib
class IPC,FlightSvr,ADBCDrv proto
class FlightSvcs,Databases external
ExArrow sits at the boundary between the BEAM and the Arrow ecosystem. It speaks the protocols that data infrastructure uses — IPC, Flight, ADBC — and surfaces them as idiomatic Elixir APIs. Explorer and Nx sit above it and consume the data it delivers.
- Elixir as a data pipeline node. Read Arrow IPC from Kafka, HTTP, or a socket; apply lightweight routing or filtering; forward via Flight or write to file — without ever copying column data into BEAM terms.
- Zero-copy query results. Run SQL against PostgreSQL, DuckDB, SQLite, or BigQuery via ADBC. The result stream is backed by native Arrow buffers. A 100-million-row result set uses minimal BEAM heap regardless of size.
- Interop with the Python/R data world. Read files produced by PyArrow, Pandas, or Polars. Write files that DuckDB, R's arrow package, or any Arrow consumer can read. No CSV conversion, no schema translation.
- First-class Flight client. Connect to Dremio, InfluxDB IOx, or any service that exposes an Arrow Flight endpoint. List flights, fetch schemas, stream data, or call custom actions — from a Phoenix controller, a GenServer, or a Livebook cell.
- Benchmarked, observable performance. The included Benchee suite quantifies the zero-copy advantage and publishes results per commit at thanos.github.io/ex_arrow/dev/bench.
- Elixir ~> 1.14 (OTP 25 / NIF 2.15 and OTP 26+ / NIF 2.16)
Add the dependency:
def deps do
[{:ex_arrow, "~> 0.1.0"}]
endUsing precompiled NIFs (default)
After mix deps.get and mix compile, ExArrow downloads a prebuilt NIF for
your platform from the project's GitHub releases. No Rust or C toolchain is
required. Supported platforms: Linux x8664/aarch64, macOS x8664/arm64,
Windows x8664.
Building from source
If no precompiled NIF exists for your platform, or you are developing ExArrow
itself, set EX_ARROW_BUILD=1 and have Rust installed:
EX_ARROW_BUILD=1 mix deps.get
EX_ARROW_BUILD=1 mix compileThe optional dependency {:rustler, "~> 0.32.0", optional: true} is required
for source builds and is already listed in ExArrow's own mix.exs.
For path dependencies (e.g. Livebook or Mix.install), add rustler
explicitly and have Rust available:
Mix.install([
{:ex_arrow, path: "/path/to/ex_arrow"},
{:rustler, "~> 0.37.3", optional: true}
])Alternatively, use the published Hex package so the precompiled NIF is used
and no Rust is needed: Mix.install([{:ex_arrow, "~> 0.1.0"}]).
Read an Arrow IPC stream:
{:ok, stream} = ExArrow.IPC.Reader.from_file("/path/to/data.arrow")
{:ok, schema} = ExArrow.Stream.schema(stream)
fields = ExArrow.Schema.fields(schema)
case ExArrow.Stream.next(stream) do
%ExArrow.RecordBatch{} = batch -> IO.inspect(ExArrow.RecordBatch.num_rows(batch))
nil -> :done
{:error, msg} -> IO.puts("Error: #{msg}")
endConnect to an Arrow Flight server:
{:ok, client} = ExArrow.Flight.Client.connect("localhost", 9999, [])
{:ok, stream} = ExArrow.Flight.Client.do_get(client, "my_ticket")
{:ok, schema} = ExArrow.Stream.schema(stream)
batch = ExArrow.Stream.next(stream)Query a database with ADBC:
{:ok, db} = ExArrow.ADBC.Database.open(driver_name: "adbc_driver_sqlite", uri: ":memory:")
{:ok, conn} = ExArrow.ADBC.Connection.open(db)
{:ok, stmt} = ExArrow.ADBC.Statement.new(conn, "SELECT 1 AS n")
{:ok, stream} = ExArrow.ADBC.Statement.execute(stmt)
{:ok, schema} = ExArrow.Stream.schema(stream)
batch = ExArrow.Stream.next(stream)Interactive notebooks (open in Livebook):
- Quick start — IPC, Flight, and ADBC in one notebook.
- 01 IPC — Stream vs file format, read/write, schema, Explorer interop.
- 02 Flight — Echo server, client, listflights, getschema, actions.
- 03 ADBC — Database, Connection, Statement, Stream, metadata APIs.
See livebook/README.md for run instructions.
Stream (sequential) — from binary or file path:
{:ok, stream} = ExArrow.IPC.Reader.from_binary(ipc_bytes)
{:ok, stream} = ExArrow.IPC.Reader.from_file("/data/events.arrow")
{:ok, schema} = ExArrow.Stream.schema(stream)
Stream.repeatedly(fn -> ExArrow.Stream.next(stream) end)
|> Enum.take_while(&(&1 != nil and not match?({:error, _}, &1)))Write to binary or file:
{:ok, binary} = ExArrow.IPC.Writer.to_binary(schema, batches)
:ok = ExArrow.IPC.Writer.to_file("/out/result.arrow", schema, batches)File format (random access):
{:ok, file} = ExArrow.IPC.File.from_file("/data/large.arrow")
{:ok, schema} = ExArrow.IPC.File.schema(file)
n = ExArrow.IPC.File.batch_count(file)
{:ok, batch} = ExArrow.IPC.File.get_batch(file, 0)Start the built-in echo server:
{:ok, server} = ExArrow.Flight.Server.start_link(9999)
{:ok, port} = ExArrow.Flight.Server.port(server)
:ok = ExArrow.Flight.Server.stop(server)Transfer data:
{:ok, client} = ExArrow.Flight.Client.connect("localhost", 9999, [])
:ok = ExArrow.Flight.Client.do_put(client, schema, [batch1, batch2])
{:ok, stream} = ExArrow.Flight.Client.do_get(client, "echo")
batch = ExArrow.Stream.next(stream)Metadata:
{:ok, flights} = ExArrow.Flight.Client.list_flights(client, <<>>)
{:ok, info} = ExArrow.Flight.Client.get_flight_info(client, {:cmd, "echo"})
{:ok, schema} = ExArrow.Flight.Client.get_schema(client, {:cmd, "echo"})
{:ok, actions} = ExArrow.Flight.Client.list_actions(client)
{:ok, ["pong"]} = ExArrow.Flight.Client.do_action(client, "ping", <<>>)Flight is plaintext only in this release. Products that speak Arrow Flight include Dremio, InfluxDB IOx, and custom analytics servers.
SQLite in-memory:
{:ok, db} = ExArrow.ADBC.Database.open(driver_name: "adbc_driver_sqlite", uri: ":memory:")
{:ok, conn} = ExArrow.ADBC.Connection.open(db)
{:ok, stmt} = ExArrow.ADBC.Statement.new(conn, "SELECT 1 AS n, 'hello' AS s")
{:ok, stream} = ExArrow.ADBC.Statement.execute(stmt)
batch = ExArrow.Stream.next(stream)PostgreSQL:
{:ok, db} = ExArrow.ADBC.Database.open(
driver_name: "adbc_driver_postgresql",
uri: "postgresql://user:pass@localhost:5432/mydb"
)
{:ok, conn} = ExArrow.ADBC.Connection.open(db)
{:ok, stmt} = ExArrow.ADBC.Statement.new(conn, "SELECT id, name FROM users")
{:ok, stream} = ExArrow.ADBC.Statement.execute(stmt)Metadata:
{:ok, types_stream} = ExArrow.ADBC.Connection.get_table_types(conn)
{:ok, schema} = ExArrow.ADBC.Connection.get_table_schema(conn, nil, nil, "users")
{:ok, objs_stream} = ExArrow.ADBC.Connection.get_objects(conn, depth: "tables")Optional driver download via the adbc package:
# Add {:adbc, "~> 0.7"} to deps, then:
Adbc.download_driver!(:sqlite)
{:ok, db} = ExArrow.ADBC.Database.open(driver_name: "adbc_driver_sqlite", uri: ":memory:")Or use the convenience helper which calls Adbc.download_driver!/1 when the
package is available: ExArrow.ADBC.DriverHelper.ensure_driver_and_open/2.
Explorer handles in-memory analysis. ExArrow handles streaming and transport. They connect via Arrow IPC.
ExArrow to Explorer:
{:ok, stream} = ExArrow.IPC.Reader.from_file("/data/source.arrow")
{:ok, schema} = ExArrow.Stream.schema(stream)
batches =
Stream.repeatedly(fn -> ExArrow.Stream.next(stream) end)
|> Enum.take_while(fn nil -> false; {:error, _} -> false; _ -> true end)
{:ok, binary} = ExArrow.IPC.Writer.to_binary(schema, batches)
df = Explorer.DataFrame.load_ipc_stream!(binary)Explorer to ExArrow:
df = Explorer.DataFrame.new(x: [1, 2, 3], y: ["a", "b", "c"])
binary = Explorer.DataFrame.dump_ipc_stream!(df)
{:ok, stream} = ExArrow.IPC.Reader.from_binary(binary)
batch = ExArrow.Stream.next(stream)ipc_bytes = get_arrow_stream_from_http_or_kafka()
{:ok, stream} = ExArrow.IPC.Reader.from_binary(ipc_bytes)
{:ok, schema} = ExArrow.Stream.schema(stream)
batches =
Stream.repeatedly(fn -> ExArrow.Stream.next(stream) end)
|> Enum.take_while(fn nil -> false; {:error, _} -> false; _ -> true end)
:ok = ExArrow.IPC.Writer.to_file("/data/ingested.arrow", schema, batches){:ok, db} = ExArrow.ADBC.Database.open(driver_name: "adbc_driver_sqlite", uri: "file:report.db")
{:ok, conn} = ExArrow.ADBC.Connection.open(db)
{:ok, stmt} = ExArrow.ADBC.Statement.new(conn, "SELECT * FROM sales WHERE year = 2024")
{:ok, stream} = ExArrow.ADBC.Statement.execute(stmt)
{:ok, schema} = ExArrow.Stream.schema(stream)
batches =
Stream.repeatedly(fn -> ExArrow.Stream.next(stream) end)
|> Enum.take_while(fn nil -> false; {:error, _} -> false; _ -> true end)
{:ok, client} = ExArrow.Flight.Client.connect("flight.example.com", 32010, [])
:ok = ExArrow.Flight.Client.do_put(client, schema, batches){:ok, client} = ExArrow.Flight.Client.connect("dremio.example.com", 32010, connect_timeout_ms: 5_000)
{:ok, flights} = ExArrow.Flight.Client.list_flights(client, <<>>)
{:ok, stream} = ExArrow.Flight.Client.do_get(client, ticket_from_service)
batch = ExArrow.Stream.next(stream)# Read a file written by PyArrow or Pandas
{:ok, file} = ExArrow.IPC.File.from_file("/data/from_python.arrow")
{:ok, schema} = ExArrow.IPC.File.schema(file)
n = ExArrow.IPC.File.batch_count(file)
for i <- 0..(n - 1) do
{:ok, batch} = ExArrow.IPC.File.get_batch(file, i)
# process batch
end
# Write for Python, R, or DuckDB
:ok = ExArrow.IPC.Writer.to_file("/data/for_python.arrow", schema, batches){:ok, db} = ExArrow.ADBC.Database.open(driver_name: "adbc_driver_postgresql",
uri: "postgresql://localhost/mydb")
{:ok, conn} = ExArrow.ADBC.Connection.open(db)
{:ok, stmt} = ExArrow.ADBC.Statement.new(conn, "SELECT * FROM sensor_readings")
{:ok, stream} = ExArrow.ADBC.Statement.execute(stmt)
{:ok, schema} = ExArrow.Stream.schema(stream)
batches =
Stream.repeatedly(fn -> ExArrow.Stream.next(stream) end)
|> Enum.take_while(fn nil -> false; {:error, _} -> false; _ -> true end)
{:ok, client} = ExArrow.Flight.Client.connect("flight.internal", 32010, [])
:ok = ExArrow.Flight.Client.do_put(client, schema, batches)ExArrow ships a Benchee-based benchmark suite in bench/ that quantifies the
zero-copy streaming advantage over row-oriented alternatives.
Benchee is a :dev-only dependency; MIX_ENV=dev is required.
MIX_ENV=dev mix run bench/ipc_read_bench.exs # single suite
MIX_ENV=dev mix run bench/run_all.exs # all suites
MIX_ENV=dev mix bench # convenience aliasHTML reports are written to bench/output/ (gitignored).
| File | What it measures |
|---|---|
ipc_read_bench.exs |
Stream handle vs materialise — BEAM memory saved by keeping data native |
ipc_write_bench.exs |
IPC serialisation vs :erlang.term_to_binary — columnar vs row-oriented write |
flight_bench.exs |
Flight doput / doget / roundtrip latency with in-process server |
adbc_bench.exs |
Stream handle vs schema peek vs full collect |
pipeline_bench.exs |
End-to-end: IPC file on disk to Flight doput without materialising in BEAM |
Results from every push to main are published at:
https://thanos.github.io/ex_arrow/dev/bench/
The CI workflow posts a PR alert comment when any scenario regresses more than 20% relative to the previous baseline.
- Memory model — handles, copying rules, NIF scheduling
- IPC guide — stream vs file, types, limitations
- Flight guide — server, client, timeouts, security
- ADBC guide — driver loading, metadata, binding
- Benchmarks guide — suites, CI publishing, interpreting results
API reference: mix docs or hexdocs.pm/ex_arrow.
mix deps.get
EX_ARROW_BUILD=1 mix compile # build NIF from source
mix test # exclude :adbc / :adbc_package tags if no drivers installed
mix docs # generate ExDoc
MIX_ENV=dev mix bench # run benchmark suiteLocal CI script (runs format, credo, dialyzer, tests, coverage, docs):
script/ciThe items below represent the planned direction for ExArrow. Contributions are welcome for any of them.
- TLS for Arrow Flight — encrypted connections for non-loopback Flight endpoints (mTLS and system CA store).
- Flight server routing — configurable ticket-to-dataset mapping so the built-in server can serve multiple named datasets, not just the last upload.
- Larger test matrix — integration tests against PostgreSQL, DuckDB, and BigQuery ADBC drivers in CI.
- ADBC connection pooling — first-class NimblePool-backed pool exposed through the public API.
- Arrow compute kernels — thin NIF bindings to
arrow-computefor filter/project/sort on native buffers without materialising into BEAM. - Parquet support — read and write Parquet files via the Arrow Rust
parquetcrate; complement Explorer's Parquet support with a streaming API. - Explorer bridge module —
ExArrow.Explorerfor direct conversion betweenExArrow.Stream/ExArrow.RecordBatchandExplorer.DataFramewithout the IPC round-trip. - Nx bridge module —
ExArrow.Nxfor converting a record batch column into anNx.Tensorwithout copying through BEAM binary.
- Flight SQL — the Flight SQL protocol for databases that expose it (DuckDB, CockroachDB, Dremio).
- Streaming writes to Parquet and Delta Lake — sink for data pipeline nodes.
- OTel / telemetry integration —
:telemetryevents for IPC read/write throughput, Flight request latency, and ADBC query duration. - Windows aarch64 precompiled NIF — once GitHub-hosted Windows arm64 runners are generally available.
When should I use ExArrow? Use ExArrow when you need to read or write Arrow IPC (stream or file), connect to an Arrow Flight server (Dremio, InfluxDB IOx, custom), or run SQL via ADBC and receive Arrow result streams. Good fit for data pipelines, ETL, and interchange with systems that already speak Arrow.
When should I not use ExArrow? Do not use it as a dataframe or query engine. For in-memory analysis, filtering, grouping, and plotting, use Explorer. Do not use it as a replacement for Ecto when you only need normal SQL results. For Parquet-only workflows with no Flight/ADBC, consider Explorer's Parquet support first.
Can I use ExArrow and Explorer together?
Yes. ExArrow handles transport and protocol layers. Use
ExArrow.IPC.Writer.to_binary/2 to produce IPC, then
Explorer.DataFrame.load_ipc_stream!/1 to load it. In the other direction,
Explorer.DataFrame.dump_ipc_stream!/1 produces bytes that
ExArrow.IPC.Reader.from_binary/1 can read.
Why do I get a 404 or "couldn't fetch NIF" on compile?
Precompiled NIFs are hosted on GitHub releases. If you are on an unsupported
platform or an unreleased version, the download fails. Set EX_ARROW_BUILD=1,
install Rust, and run mix compile to build from source.
Is Arrow Flight over TLS supported? Not yet. Flight in this release is plaintext only. Use on localhost or trusted networks. TLS is on the roadmap for v0.2.
Which ADBC drivers are supported?
Any ADBC driver that provides a shared library — for example
adbc_driver_sqlite, adbc_driver_postgresql, or the DuckDB ADBC driver. You
must install the driver and pass its path, or ensure the driver manager can find
it. Metadata and binding support depend on the individual driver.
MIT. See LICENSE for details. Copyright (c) 2025 Thanos Vassilakis.