⚠️ Beta software / demo. This is a reference example, not a production service. It exists to show how to stand up a GA4GH Refget Sequences API in Node.js backed by a RefgetStore, using the@databio/gtars-nodebindings. Expect rough edges and breaking changes.
This implements the GA4GH Refget Sequences API, the standard for retrieving reference sequences by digest. The point of the demo is that it serves data from a RefgetStore instead of a conventional database: a standards-compliant refget server running directly on a content-addressable, file-based store, with no SQL database, no ORM, and no bulk-loading step.
A RefgetStore is a content-addressable, file-based database for biological sequences and sequence collections. Sequences are looked up by GA4GH digest and stored with deduplication and compact encoding, and the store can live on local disk or on static object storage like S3 without a database server. It is written in Rust in the gtars project (the gtars-refget crate) and exposed to JavaScript through the @databio/gtars-node bindings, which is what this server uses to read sequences.
The server is a lightweight proxy that never holds sequence bytes in memory. It either redirects raw-store bytes to the backing store or stream-decodes packed bytes directly to the HTTP response (see How it works). It also exposes read-only sequence collection endpoints (listing and metadata) as a convenience, but serving sequences is the point; the seqcol comparison endpoint is not implemented.
Learn more about RefgetStore:
- What is RefgetStore? — overview, concepts, and the full list of components
@databio/gtars-node— the Node.js bindings this server is built on (source)gtars— the Rust engine behind it all
npm install
npm run build
# Run the demo (builds a store from test FASTAs and starts the server)
bash demo_up.shThe server proxies sequence bytes in one of two ways, depending on how the backing RefgetStore is stored:
- Redirect (Raw-mode stores). The server returns
302with aLocationheader pointing at<REFGET_STORE_URL>/sequences/<digest[0:2]>/<digest>.seq. Clients follow the redirect and hit the backing store (typically S3) directly. Range headers on the original request flow through to the backing store, which responds with206 Partial Content. The server never loads bytes. Query-param partials (?start=&end=) are rejected by default; use theRangeheader. - Stream-decode (Encoded-mode stores). Stored bytes are 2-bit/3-bit packed; they cannot be redirected verbatim. The server calls
RefgetStore.streamSequence(digest, start, end)which returns aReadableof decoded ASCII bases, piped directly to the HTTP response. Memory use is bounded by the stream's internal buffer regardless of sequence size.
| Store mode | REFGET_PROXY_MODE=auto |
redirect-only |
stream-only |
|---|---|---|---|
| Raw | redirect (302) | redirect (302) | stream (decode is a no-op) |
| Encoded | stream | startup error | stream |
| Env var | Default | Description |
|---|---|---|
REFGET_STORE_URL |
— | URL to a remote RefgetStore (S3 / HTTP). Required for redirect mode. |
REFGET_STORE_PATH |
— | Path to a local RefgetStore dir. Forces stream-only mode. |
REFGET_CACHE_PATH |
/tmp/refgetstore_cache |
Metadata cache for remote stores. |
REFGET_PROXY_MODE |
auto |
auto (redirect Raw, stream Encoded), redirect-only, stream-only. |
REFGET_ALLOW_QUERY_PARAM_PARTIALS |
false |
When true, ?start=&end= in redirect mode fall through to streaming instead of 400. |
PORT |
3000 |
HTTP port. |
Exactly one of REFGET_STORE_URL or REFGET_STORE_PATH must be set.
| Endpoint | Description |
|---|---|
GET /service-info |
GA4GH service-info with store statistics |
| Endpoint | Description |
|---|---|
GET /sequence |
List all sequences (disabled for stores with > 10,000 sequences) |
GET /sequence/:digest |
Retrieve sequence bases (302 redirect or streaming, depending on proxy mode). Supports Range header; ?start=&end= accepted in stream mode. |
GET /sequence/:digest/metadata |
Sequence metadata (length, md5, ga4gh digest) |
GET /sequence/service-info |
Refget service capabilities |
| Endpoint | Description |
|---|---|
GET /collection |
List all collections |
GET /collection/:digest |
Collection metadata |
GET /collection/:digest/metadata |
Collection metadata (explicit) |
node scripts/build_store.mjs --fasta path/to/genome.fa --output my_store
REFGET_STORE_PATH=my_store REFGET_PROXY_MODE=stream-only npm startUntil @databio/gtars-node is published with streamSequence, link to a local build:
# In the gtars repo
cd repos/gtars/gtars-node
npm run build
npm link
# In this repo
cd repos/refgetstore-node-demo
npm link @databio/gtars-node
npm run dev# Build
docker build -f deployment/dockerhub/Dockerfile -t refgetstore-server .
# Run (redirect-mode example)
docker run -p 80:80 \
-e REFGET_STORE_URL=https://my-bucket.s3.amazonaws.com/refget/store \
refgetstore-serverseqcolapi is the companion
server in the refget ecosystem: a Python/FastAPI implementation of the
GA4GH Sequence Collections API (collection
metadata and comparison). It ships as part of the refget
Python package and runs in production at seqcolapi.databio.org.
Both speak the GA4GH refget and seqcol APIs, and both can be backed by a RefgetStore. The difference is what they serve, not where they store it:
| seqcolapi | refgetstore-server (this repo) | |
|---|---|---|
| Runtime | Python + FastAPI | Node.js + Hono |
| Storage | PostgreSQL or RefgetStore (local/S3) | RefgetStore only (local/S3) |
Collection metadata (/collection) |
✅ | ✅ |
Collection comparison (/comparison) |
✅ | ❌ (pending napi binding) |
| FASTA DRS / pangenome endpoints | ✅ | ❌ |
Raw sequence residues (GET /sequence/:digest → bases) |
❌ not served | ✅ primary purpose |
| Sequence delivery | n/a | 302-redirect to the backing store, or stream-decode; never buffers bytes |
In short: seqcolapi serves sequence-collection metadata and comparisons, not sequence bases. This server serves the sequence bases themselves, streaming or redirecting them out of a (possibly S3-backed) RefgetStore with no database and no Python.
- No comparison endpoint (
/comparison/:digest1/:digest2), pending napi binding support - Read-only: store must be pre-built from FASTA files