Skip to content

Optimize KDA prefill kernel (SM10X) #16

@icavan

Description

@icavan

Description

Implement an optimized KDA prefill kernel for SM10X (Blackwell) GPUs.

Context

A fused KDA prefill kernel is already available for SM90 (Hopper). Porting and optimizing this for SM10X (Blackwell) would leverage Blackwell's architectural improvements for better inference performance.

Tasks

  • Port the SM90 fused KDA prefill kernel to SM10X
  • Optimize for Blackwell-specific features (TMEM. etc)
  • Add correctness tests
  • Benchmark

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions