High-performance Rust node for the BitSage Network, featuring Obelysk Protocol integration with GPU-accelerated zero-knowledge proofs.
- Verifiable Computation - Prove that GPU computations ran correctly
- TEE Integration - Data encrypted in Trusted Execution Environment
- GPU-Accelerated Proving - 54-174x faster than CPU SIMD
- True Multi-GPU - Thread-safe parallel execution (193% scaling!)
- Minimal Proof Output - Only 32-byte attestation returned
| Proof Size | GPU Compute | SIMD Estimate | Speedup |
|---|---|---|---|
| 2^18 (8MB) | 2.42ms | 132ms | 54.6x β |
| 2^20 (32MB) | 5.71ms | 560ms | 98.2x β |
| 2^22 (64MB) | 17.73ms | 2.22s | 125.2x β |
| 2^23 (64MB) | 25.83ms | 4.5s | 174.2x β |
| Metric | Value |
|---|---|
| Throughput | 1,237 proofs/sec π |
| Per-proof time | 0.81ms |
| Scaling efficiency | 193% (super-linear!) |
| Hourly capacity | 4.45 million proofs |
| Daily capacity | 107 million proofs |
| GPU | Speedup | Proofs/sec | Status |
|---|---|---|---|
| A100 80GB | 45-130x | 127 | Verified β |
| H100 80GB | 55-174x | 150 | Verified β |
| 4x H100 | 55-174x | 1,237 | Verified β |
| Configuration | Proofs/hr | Cost per Proof |
|---|---|---|
| A100 80GB | 457,200 | $0.0000033 |
| H100 80GB | 540,000 | $0.0000056 |
| 4x H100 | 4,453,200 | $0.0000026 |
rust-node/
βββ src/
β βββ obelysk/ # Obelysk Protocol
β β βββ prover.rs # ZK proof generation
β β βββ vm.rs # Obelysk Virtual Machine
β β βββ stwo_adapter.rs # Stwo GPU integration
β βββ coordinator/ # Job coordination
β βββ network/ # P2P networking
β βββ blockchain/ # Starknet integration
β βββ compute/ # Job execution
βββ libs/stwo/ # GPU-accelerated Stwo fork
- Rust nightly
- CUDA Toolkit 12.x (for GPU acceleration)
- NVIDIA GPU (H100 recommended for best performance)
# Standard build (CPU only)
cargo build --release
# Single GPU
cargo build --release --features cuda
# Multi-GPU
cargo build --release --features cuda,multi-gpucd libs/stwo
# Production benchmark
cargo run --example obelysk_production_benchmark --features cuda-runtime --release
# H100 comprehensive (all proof sizes)
cargo run --example h100_comprehensive_benchmark --features cuda-runtime --release
# True multi-GPU benchmark (1,237 proofs/sec)
cargo run --example true_multi_gpu_benchmark --features cuda-runtime --releaseβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Obelysk Proof Pipeline β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β 1. Client submits encrypted workload β
β β β
β βΌ β
β 2. Data uploaded to GPU (stays in TEE) β
β β β
β βΌ β
β 3. GPU computes: FFT β FRI β Merkle β
β (Data NEVER leaves GPU - 174x faster!) β
β β β
β βΌ β
β 4. 32-byte proof/attestation returned β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MultiGpuExecutorPool (Thread-Safe) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ β
β β Arc<Mutex<Ctx>> β β Arc<Mutex<Ctx>> β β Arc<Mutex<Ctx>> β ... β
β β GPU 0 β β GPU 1 β β GPU 2 β β
β β - Executor β β - Executor β β - Executor β β
β β - TwiddleCache β β - TwiddleCache β β - TwiddleCache β β
β ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ β
β β β β β
β βΌ βΌ βΌ β
β ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ β
β β Thread 0 β β Thread 1 β β Thread 2 β β
β β Proofs 0,4,8,12 β β Proofs 1,5,9,13 β β Proofs 2,6,10,14β β
β ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ β
β β
β Result: 1,237 proofs/sec | 4.45M proofs/hour | 107M proofs/day β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Factor | Impact |
|---|---|
| Pre-warmed twiddles | Eliminates ~87ms init overhead |
| True parallelism | Each GPU has own executor |
| No contention | Thread-safe Arc<Mutex<>> per GPU |
| H100 performance | Faster than conservative baseline |
# Blockchain
STARKNET_RPC_URL=https://starknet-sepolia.public.blastapi.io
STARKNET_PRIVATE_KEY=0x...
# GPU
CUDA_VISIBLE_DEVICES=0,1,2,3 # For multi-GPU[server]
port = 8080
host = "0.0.0.0"
[gpu]
enabled = true
device_ids = [0, 1, 2, 3] # Multi-GPU
mode = "throughput" # or "distributed"# All tests
cargo test
# GPU integration tests
cargo test --features cuda gpu_backend
# Multi-GPU tests
cargo test --features cuda,multi-gpu multi_gpuGET /health- Node health statusGET /gpu/status- GPU availability and stats
POST /jobs- Submit new jobGET /jobs/:id- Get job statusGET /jobs/:id/proof- Get 32-byte proof
POST /workers/register- Register GPU workerGET /workers- List workers with GPU info
- stwo-gpu - GPU-accelerated Stwo prover
- BitSage-Cairo-Smart-Contracts - Cairo contracts
- BitSage-WebApp - Web frontend
MIT License - see LICENSE for details.
Built by BitSage Network
Powering verifiable computation with GPU-accelerated ZK proofs
π Verified: 1,237 proofs/sec on 4x H100 | 107M proofs/day