Summary
Found via code audit of the CUDA kernel code.
Bug 1: Missing CUDA error check after kernel launch (MEDIUM)
Files:
- `csrc/kda/sm100/kda_fwd_intra_kernel_sm100.hpp`, line 372
- `csrc/kda/sm100/kda_fwd_recomp_w_u_kernel_sm100.hpp`, line 420
Both SM100 launchers have `CHECK_CUDA(cudaFuncSetAttribute(...))` before launch but no error check after `kernel_fn<<<...>>>()`. The repo defines `CHECK_CUDA_KERNEL_LAUNCH()` in `kerutils/common/common.h` but never uses it.
Fix: Add `CHECK_CUDA_KERNEL_LAUNCH();` after each kernel launch.
Bug 2: Potential underflow when `sub_seq_len == 0` (MEDIUM)
File: `csrc/kda/sm100/fwd_helpers.hpp`, lines 92-95 and throughout
`min(X, sub_seq_len - 1)` underflows to -1 (or INT_MAX unsigned) when `sub_seq_len == 0`, causing OOB SMEM access.
Fix: Add early return guard when `sub_seq_len <= 0`.
Bug 3: `exit(1)` in CHECK_CUDA macro kills Python process (LOW)
File: `csrc/kerutils/include/kerutils/common/common.h`, line 29
Should throw `KUException` instead of calling `exit(1)` to allow Python exception handling.
Bug 4: Hardcoded include path in setup.py (LOW)
File: `setup.py`, line 200
`/usr/local/cuda/include/cccl` is hardcoded. Should use `CUDA_HOME`.
Found via code audit.
Summary
Found via code audit of the CUDA kernel code.
Bug 1: Missing CUDA error check after kernel launch (MEDIUM)
Files:
Both SM100 launchers have `CHECK_CUDA(cudaFuncSetAttribute(...))` before launch but no error check after `kernel_fn<<<...>>>()`. The repo defines `CHECK_CUDA_KERNEL_LAUNCH()` in `kerutils/common/common.h` but never uses it.
Fix: Add `CHECK_CUDA_KERNEL_LAUNCH();` after each kernel launch.
Bug 2: Potential underflow when `sub_seq_len == 0` (MEDIUM)
File: `csrc/kda/sm100/fwd_helpers.hpp`, lines 92-95 and throughout
`min(X, sub_seq_len - 1)` underflows to -1 (or INT_MAX unsigned) when `sub_seq_len == 0`, causing OOB SMEM access.
Fix: Add early return guard when `sub_seq_len <= 0`.
Bug 3: `exit(1)` in CHECK_CUDA macro kills Python process (LOW)
File: `csrc/kerutils/include/kerutils/common/common.h`, line 29
Should throw `KUException` instead of calling `exit(1)` to allow Python exception handling.
Bug 4: Hardcoded include path in setup.py (LOW)
File: `setup.py`, line 200
`/usr/local/cuda/include/cccl` is hardcoded. Should use `CUDA_HOME`.
Found via code audit.