Skip to content

Possible batch indexing bug in gather_points_kernel and gather_points_grad_kernel #4

Description

@yslbit

Hi, thanks for maintaining this repo.

I have a question about the CUDA launch configuration in sampling_gpu.cu.

In gather_points_kernel, the kernel uses:

int bs_idx = blockIdx.z;
int c_idx = blockIdx.y;
int pt_idx = blockIdx.x * blockDim.x + threadIdx.x;

So it looks like the expected grid layout should be:

`dim3(num_point_blocks, c, b)`

where:
- `blockIdx.x`  indexes chunks of sampled points
- `blockIdx.y` indexes channels
- `blockIdx.z` indexes batches

However, in `gather_points_kernel_wrapper` , the kernel is launched with:
`gather_points_kernel<scalar_t>
    <<<dim3(b, c, 1), opt_n_threads(npoints), 0, stream>>>(...);
`
This means `blockIdx.z` is always `0`, so bs_idx seems to always be `0`. If I understand correctly, this would only process batch `0`, while batches` 1..b-1` would remain unwritten.

The same pattern also appears in `gather_points_grad_kernel_wrapper`.

Should the launch configuration instead be something like:

`unsigned int n_threads = opt_n_threads(npoints);
unsigned int n_blocks = (npoints + n_threads - 1) / n_threads;

gather_points_kernel<scalar_t>
    <<<dim3(n_blocks, c, b), n_threads, 0, stream>>>(...);
`
and similarly for `gather_points_grad_kernel`?

I may be missing something about the intended indexing scheme, so I wanted to ask before opening a PR.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions