dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
dbt is the T in ELT. Organize, cleanse, denormalize, filter, rename, and pre-aggregate the raw data in your warehouse so that it's ready for analysis.
GizmoSQL is an Apache Arrow Flight-based SQL engine for data warehouses. It is designed to be fast, scalable, and easy to use.
It has DuckDB and SQLite back-ends. You can see more information about GizmoSQL here.
This adapter provides feature parity with dbt-duckdb for all features applicable to a remote Flight SQL connection:
- Table and View (SQL and Python)
- Incremental with four strategies:
append-- simple record additionsdelete+insert-- key-based upserts with DuckDB'sDELETE...USINGsyntaxmerge-- uses DuckDB'sMERGEwithUPDATE BY NAME/INSERT BY NAMEmicrobatch-- time-based batch processing viaevent_timewindows
- Snapshot (check and timestamp modes) using UPDATE+INSERT pattern
- Schema change handling:
ignore,append_new_columns,sync_all_columns,fail
Python models execute client-side using a local DuckDB instance for full API compatibility, then ship results to GizmoSQL via ADBC bulk ingest:
def model(dbt, session):
dbt.config(materialized="table")
df = dbt.ref("upstream_model")
df = df.filter(df.amount > 100)
return df- Supports DuckDB relations, pandas DataFrames, and PyArrow Tables as return types
dbt.ref()anddbt.source()fetch data from GizmoSQL as Arrow and expose it as DuckDB relations- Incremental Python models supported (with proper
dbt.is_incrementalhandling)
Because Python models run client-side, dbt.ref('big_table').filter(...) pulls the entire upstream table over the network before filtering locally. When you only need a small slice of a large server-side table, use session.remote_sql(query) to push the query down to the GizmoSQL server — the filter/aggregation runs server-side and only the result crosses the wire:
def model(dbt, session):
dbt.config(materialized="table")
schema = dbt.this.schema
# Runs on the GizmoSQL server; only matching rows come back.
return session.remote_sql(
f"select * from {schema}.big_table where name = 'Joe'"
)remote_sql() returns a chainable local DuckDB relation, so you can combine it with the usual .filter(), .project(), .df(), pandas, etc. The rest of the session object behaves exactly like a local DuckDB connection (session.sql(...), session.register(...), …) — remote_sql is an additive escape hatch, not a replacement.
Seeds are loaded using DuckDB's CSV reader on the client side with ADBC bulk ingest to the server:
- Correct null handling (empty CSV fields become SQL
NULL, not the string'null') - Proper type inference (dates detected as
DATE, integers asBIGINT, etc.) - Supports
column_typesoverrides and custom delimiters - Significantly faster than dbt's default batch
INSERTpath
All constraint types are enforced: CHECK, NOT NULL, UNIQUE, PRIMARY KEY, FOREIGN KEY.
persist_docssupport (COMMENT ONfor relations and columns)- Full catalog generation with
dbt docs generate
DuckDB-compatible overrides for: dateadd, last_day, listagg, split_part.
Because GizmoSQL is essentially a remote, server-side DuckDB, dbt-gizmosql supports the external materialization — models that are backed by Parquet, CSV, or JSON files rather than database tables — with the same config surface as dbt-duckdb.
The important distinction: the COPY runs on the GizmoSQL server, not on your dbt client. That's usually what you want. A production GizmoSQL deployment typically sits on a powerful cloud VM (lots of CPU, lots of RAM, fast local disks, a fat NIC, IAM role for blob storage). Pushing the write to that box is far faster than streaming a multi-GB result set back to the client just to write it out again — and the server's credentials/network topology are exactly what's needed to reach the destination.
{{ config(materialized='external', location='/data/warehouse/fact_orders.parquet') }}
select m.*, s.id is not null as has_source_id
from {{ ref('upstream_model') }} m
left join {{ source('upstream', 'source') }} s using (id)| Option | Default | Description |
|---|---|---|
location |
external_location macro |
Server-side path (or S3/GCS/Azure URI) to write to. See below. |
format |
parquet |
One of parquet, csv, json. Inferred from the location extension when omitted. |
delimiter |
, |
For CSV, the field delimiter. |
options |
{} |
Any other options for DuckDB's COPY statement — e.g. compression, partition_by, codec, per_thread_output. |
parquet_read_options |
{'union_by_name': False} |
Options passed to read_parquet() when building the read-side view. |
csv_read_options |
{'auto_detect': True} |
Options passed to read_csv(). |
json_read_options |
{'auto_detect': True} |
Options passed to read_json(). |
If location is omitted, the file is written to {external_root}/{model_name}.{format}. Set external_root in your profile to control the default write location (local path or cloud URI):
my-gizmosql-db:
target: dev
outputs:
dev:
type: gizmosql
host: gizmosql.prod.example.com
port: 31337
auth_type: external
use_encryption: True
external_root: "s3://my-warehouse/dbt-output"The external_root is resolved on the GizmoSQL server, so any path/URI the server's DuckDB backend can reach works — local filesystem paths, s3://..., gs://..., azure://..., etc. Any credentials needed to write there live on the server, not on your dbt client.
After the write, dbt-gizmosql creates a view over the file via read_parquet / read_csv / read_json, so downstream models can ref() the external model like any other relation.
{{ config(
materialized='external',
format='parquet',
options={'partition_by': 'year, month', 'compression': 'zstd'}
) }}
select * from {{ ref('fact_events') }}- The directory referenced by
external_root(or the parent directory of an explicitlocation) must already exist on the server — DuckDB'sCOPY ... TO '<file>'does not create parent directories for single-file writes. Cloud URIs likes3://bucket/prefixdon't have this constraint. - Incremental strategies are not supported on
externalmodels — each run fully replaces the file(s). - dbt-duckdb's
plugin/glue_registeroptions are not supported: those are a client-side feature of dbt-duckdb with no analogue on the server. Setting either will produce a clear compile-time error.
# Create the virtual environment
python3 -m venv .venv
# Activate the virtual environment
. .venv/bin/activate
pip install --upgrade pip
python -m pip install dbt-core dbt-gizmosqlgit clone https://github.com/gizmodata/dbt-gizmosql
cd dbt-gizmosql
# Create the virtual environment
python3 -m venv .venv
# Activate the virtual environment
. .venv/bin/activate
# Upgrade pip, setuptools, and wheel
pip install --upgrade pip setuptools wheel
# Install the dbt GizmoSQL adapter - in editable mode with dev dependencies
pip install --editable .[dev]Add the following to your ~/.dbt/profiles.yml (change values to match your environment):
my-gizmosql-db:
target: dev
outputs:
dev:
type: gizmosql
host: localhost
port: 31337
database: dbt
user: [username]
password: [password]
use_encryption: True
tls_skip_verify: True
threads: 2For browser-based OAuth/SSO, use auth_type: external -- no username or password needed:
my-gizmosql-db:
target: dev
outputs:
dev:
type: gizmosql
host: gizmosql.example.com
port: 31337
auth_type: external
use_encryption: True
threads: 2This adapter connects to GizmoSQL via Apache Arrow Flight SQL using the ADBC driver (adbc-driver-gizmosql). Key architectural decisions:
- Autocommit mode: Each statement auto-commits immediately. Flight SQL's
PREPAREphase validates against committed catalog state, so explicit transactions would cause DDL from earlier statements to be invisible to later ones. - Client-side DuckDB: Seeds and Python models use a local DuckDB instance for processing, with results shipped to the server via ADBC bulk ingest (Arrow columnar format over gRPC).
- MERGE BY NAME: Incremental merges use DuckDB's
UPDATE BY NAME/INSERT BY NAMEsyntax, which is resilient to column ordering differences.
This adapter follows semantic versioning. The major.minor version tracks dbt-core (e.g., dbt-core 1.11.x -> dbt-gizmosql 1.11.x).
- Want to report a bug or request a feature? Open an issue
- Want to contribute? Pull requests are welcome
Everyone interacting in the dbt project's codebases, issue trackers, chat rooms, and mailing lists is expected to follow the dbt Code of Conduct.