Predastore developed by Mulga Defense Corporation is a distributed, S3-compatible object storage system with Reed-Solomon erasure coding, built for bare-metal, edge, and on-premise deployments. It is the storage backend for Spinifex β an AWS-compatible infrastructure stack for private clouds.
Predastore runs as a distributed cluster with erasure-coded shards, Raft-consensus metadata, and QUIC-based inter-node transport. For development, all nodes run in a single process on loopback.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β S3 Client (AWS CLI/SDK) β
ββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Predastore S3D (HTTP/TLS) β
β Auth (SigV4) Β· Routing Β· Backend Abstraction β
ββββββββββ¬ββββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββ
β s3db Cluster β β QUIC Shard Nodes β
β (Raft Consensus) β β ββββββββββββββββββββββββββββββ β
β β β β Node 0 ββ Node 1 ββ Node 2 β β
β BoltDB (Raft log) β β β Store ββ Store ββ Store β β
β BadgerDB (FSM) β β β(seg+ix)ββ(seg+ix)ββ(seg+ix)β β
βββββββββββββββββββββββ β ββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββ
S3D serves the S3 HTTP API with AWS Signature V4 authentication. The s3db cluster provides strongly consistent metadata via Raft (HashiCorp Raft + BoltDB + BadgerDB). QUIC shard nodes store erasure-coded object data in append-only segment files, with each shard occupying a contiguous extent indexed by a per-node BadgerDB. Inter-node communication uses persistent QUIC connections with pooled, multiplexed streams β eliminating per-request TLS handshakes.
See DESIGN.md for the full architecture reference, including the data model, QUIC protocol format, Raft consensus details, hash ring placement, and failure handling.
- Reed-Solomon erasure coding β objects are split into data + parity shards (configurable, e.g. RS(3,2) tolerates loss of any 2 nodes). No full replication overhead.
- Raft consensus for metadata β bucket and object metadata is strongly consistent across the cluster. Reads can go to any node; writes go through the leader.
- QUIC transport β node-to-node shard I/O uses QUIC over UDP with connection pooling. A single long-lived connection per node pair carries multiplexed streams, so shard writes cost only a stream ID allocation, not a TLS handshake.
- Append-only segments β each shard node writes data to large append-only segment files. A shard occupies a contiguous extent within one segment, pre-allocated to enable lock-free writing to disk. A per-node BadgerDB index maps shard keys to extents.
- AES-256-GCM encryption at rest β every 8 KiB fragment is sealed under a per-fragment GCM nonce with AAD binding it to its
(objectHash, shardIndex, shardNum, fragNum)position, so tamper, replay, and cross-shard splice attempts fail to authenticate. GCM is the sole on-disk integrity authority (no separate CRC). A 32-byte cluster master key is loaded from a0600file path supplied via-encryption-key-file/ENCRYPTION_KEY_FILE. - Consistent hash ring β shard placement is deterministic via a hash ring with virtual nodes. Adding nodes bumps a ring epoch; old objects stay on the old epoch, new writes use the new one.
- Single binary β
./bin/s3druns one cluster node (S3 API server + Raft database + QUIC shard node). A cluster is Ns3dprocesses pointed at the same config;./scripts/start.shlaunches all of them locally on loopback aliases for development.
Predastore implements key S3 operations compatible with AWS CLI, SDKs, and existing S3 tools:
| Category | Operations |
|---|---|
| Buckets | CreateBucket, DeleteBucket, ListBuckets, HeadBucket |
| Objects | PutObject, GetObject, DeleteObject, HeadObject, ListObjects/V2 |
| Multipart | InitiateMultipartUpload, UploadPart, CompleteMultipartUpload |
| Auth | AWS Signature V4 |
make build # builds ./bin/s3d (also generates dev TLS certs)The ./scripts/ directory contains helpers for running a multi-node cluster
locally on loopback IP aliases β the recommended way to exercise the
distributed code paths in development:
./scripts/start.sh 3node # launch a 3-node cluster
./scripts/start.sh -w 5node # launch a 5-node cluster, wait until ready
./scripts/stop.sh # stop all running clusters
./scripts/clean.sh # stop and wipe cluster data
./scripts/bench.sh 3node # run warp benchmark against a cluster
./scripts/bench.sh disk # run raw-disk fio benchmarkCluster runtime data (logs, PID files, segment files, BadgerDB indexes) lives
under $PREDA_DIR (default /tmp/predastore/<clustername>/). The start script
sets up loopback IP aliases (requires sudo) and generates TLS certs on first
run.
./bin/s3d is a single-node process β for running one node of a cluster
directly (e.g. on a dedicated host in production, or for inspecting one
node in isolation):
./bin/s3d \
--config config/3node.toml \
--node 1 \
--host 10.11.12.1 \
--port 8443 \
--base-path /tmp/predastore/3node \
--tls-key /tmp/predastore/3node/server.key \
--tls-cert /tmp/predastore/3node/server.pem \
--encryption-key-file /tmp/predastore/3node/master.keyThe encryption key file must be exactly 32 raw bytes (no base64, no header)
with mode 0600 β group/other-readable keys are rejected outright. Generate
one with ( umask 0177 && openssl rand -out master.key 32 ). The same key
must be supplied to every node in a cluster; rotating it is not supported
(see Roadmap β envelope encryption).
Cluster configurations live under config/ as TOML files, one per topology:
config/
3node.toml # 3 db + 3 storage nodes
5node.toml # 5 db + 5 storage nodes
7node.toml # 7 db + 7 storage nodes
Each config defines [[db]] and [[storage]] sections specifying node IDs,
hosts, ports, and Reed-Solomon parameters.
TLS certificates are generated on first build:
make certs # Generate certs/server.{pem,key}The QUIC inter-node transport and the s3db REST client now verify the server
certificate β InsecureSkipVerify is gone from both the production code path
and the test fixtures (tests inject an ephemeral CA via
quicclient.SetDefaultRootCAs). Standalone operators must install the cluster
CA into the host trust store before launching s3d, otherwise nodes cannot
dial each other:
# Debian / Ubuntu
sudo cp cluster-ca.pem /usr/local/share/ca-certificates/predastore-cluster-ca.crt
sudo update-ca-certificates
# RHEL / Fedora / Amazon Linux
sudo cp cluster-ca.pem /etc/pki/ca-trust/source/anchors/predastore-cluster-ca.pem
sudo update-ca-trustWhen predastore is deployed by Spinifex, the cluster CA is installed into the host trust store automatically as part of node bootstrap β no manual action is required.
# Create a bucket
aws --endpoint-url https://10.11.12.1:8443/ s3 mb s3://my-bucket
# Upload a file
aws --endpoint-url https://10.11.12.1:8443/ s3 cp ./file.txt s3://my-bucket/
# List bucket contents
aws --endpoint-url https://10.11.12.1:8443/ s3 ls s3://my-bucket/
# Download a file
aws --endpoint-url https://10.11.12.1:8443/ s3 cp s3://my-bucket/file.txt ./downloaded.txtDistributed storage with erasure coding, Raft-consensus metadata, and QUIC transport. The simplest way to bring up a cluster locally:
./scripts/start.sh -w 3node # 3-node cluster on loopback aliasesThe distributed backend's data model:
| Unit | Size | Description |
|---|---|---|
| Object | arbitrary | RS-encoded end-to-end into K data + M parity shards |
| Shard | βobject_size / Kβ |
Per-node RS slice; occupies a contiguous extent |
| Fragment | 32 B header + 8 KiB body + 16 B GCM tag = 8240 B | On-disk unit; AES-256-GCM seals body with AAD bound to (objectHash, shardIndex, shardNum, fragNum) |
| Segment file | up to 4 GiB | Append-only container holding extents from one or more shards |
See DESIGN.md for full configuration reference, including database node setup, shard node setup, RS tuning, and deployment modes.
Predastore is the default S3 storage provider for Spinifex. When running as part of the Spinifex stack, Predastore integrates via NATS messaging and provides storage for:
- EC2 AMI images β machine images for VM launches
- EBS volume snapshots β via Viperblock, which uses Predastore as its S3-compatible backend
- User data β cloud-init configurations and system artifacts
Predastore subscribes to NATS topics (s3.putobject, s3.getobject, s3.createbucket, etc.) for seamless integration with the rest of the Spinifex control plane.
make build # Build s3d binary (also generates TLS certs)
make certs # Generate dev TLS certs
make test # Run tests
make preflight # Full CI checks (lint, govulncheck, tests, race detector)
make clean # Clean build artifactsmake docker_s3d # Build Docker image
make docker_compose_up # Start with docker-compose
make docker_compose_down # Stop servicesFor distributed mode, increase system socket buffers for QUIC:
sudo sysctl -w net.core.rmem_max=7500000
sudo sysctl -w net.core.wmem_max=7500000- S3 API core (buckets, objects, multipart)
- AWS Signature V4 authentication
- Distributed storage with Reed-Solomon erasure coding
- Raft-consensus metadata (s3db)
- QUIC transport with connection pooling
- Consistent hash ring placement
- AES-256-GCM encryption at rest (single cluster-wide master key)
- Envelope encryption (master key rotation, per-bucket / per-tenant keys)
- Gossip-based node discovery
- Segment compaction and garbage collection
- Automatic shard rebalancing
- Background read-repair
- Bucket versioning
- Lifecycle policies
Apache 2.0 License. See LICENSE for details.