FastLM

tinyserve-vllm Public

[ACM MM 2025 Oral] TinyServe: Query-Aware Page Allocation Optimization

Python 10 2

CSV-Decode Public

CSV-Decode: Certifiable Sub-Vocabulary Decoding for Efficient Large Language Model Inference

Python 8

HSGM Public

[ICPADS 2025 Oral, *SEM 2025 Oral] HSGM: Hierarchical Segment-Graph Memory for Scalable Long-Text Semantics

Python 7

SPI_VecDB Public

[ICPADS 2025 Oral] Distributed Parallel Multi-Resolution Vector Search

Go 7

FastCache Public

FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation [Efficient ML Model]

Python 6

CXL-SpecKV Public

[FPGA'26 Oral] CXL-SpecKV: A Disaggregated FPGA Speculative KV-Cache for Datacenter LLM Serving

C++ 6 1

Provide feedback