turboquant_implementation

🚀 TurboQuant-PyTorch

An unofficial, end-to-end PyTorch implementation of the Google Research ICLR 2026 paper: [TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate].

✨ Key Features

Faithful QJL Implementation: Implements the 1-bit Quantized Johnson-Lindenstrauss (QJL) transform for unbiased inner-product estimation.
Spherical Lloyd-Max Quantization: Includes the continuous k-means solver for generating optimal codebooks based on the Beta distribution.
Variance-Optimized Projection: Expands the QJL projection dimension ($m = 4d$) to heavily suppress estimator variance before the Softmax bottleneck.
FP16 Value Passthrough (The Outlier Fix): Compresses the Key cache to 3-bit/4-bit while leaving the Value cache in native FP16. This prevents spherical quantization from permanently destroying the massive activation outliers necessary for coherent LLM generation.

The KV cache is the bottleneck for serving LLMs at scale. TurboQuant gives 6x compression with zero quality loss:

6x more concurrent users per GPU — direct 6x reduction in cost per query
6x longer context windows in the same memory budget
No calibration step — compress on-the-fly as tokens stream in
8x speedup on attention at 4-bit on H100 GPUs (less data to load from HBM)

At H100 prices (~$2-3/hr), serving 6x more users per GPU translates to millions in savings at scale.

Requirements: torch, transformers, scipy

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
README.md		README.md
output_1.jpg		output_1.jpg
output_2.jpg		output_2.jpg
turboquant.py		turboquant.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

turboquant_implementation

🚀 TurboQuant-PyTorch

✨ Key Features

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

turboquant_implementation

🚀 TurboQuant-PyTorch

✨ Key Features

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages