RFC - Expose send params driver bypass#75
RFC - Expose send params driver bypass#75yanivbl6 wants to merge 5 commits intogpudirect:expose_send_paramsfrom
Conversation
| ( ((((v) & 0xffff0000U) >> 16) == GDS_API_MAJOR_VERSION) && \ | ||
| ((((v) & 0x0000ffffU) >> 0 ) >= GDS_API_MINOR_VERSION) ) | ||
|
|
||
| #define IBV_EXP_SEND_GET_INFO (1 << 28) |
There was a problem hiding this comment.
Does the code still keep this flag in send_flags? I think it would be better to use a separate field so that future send flags in libibverbs won't conflict with this definition.
There was a problem hiding this comment.
the problem here is that gds_send_wr is simply ibv_exp_send_wr.
maybe we want to have a new flags arg for gds_prepare_send().
| * Represents a posted send operation on a particular QP | ||
| */ | ||
|
|
||
| #define GDS_SEND_MAX_SGE 16 |
There was a problem hiding this comment.
Is there a way to enforce this limitation? Maybe check it in gds_create_qp?
| * Notes: | ||
| * - TODO. | ||
| */ | ||
| int gds_report_post(struct gds_qp *gqp /*, struct gds_send_wr* wr*/); |
There was a problem hiding this comment.
The function documentation is still missing. I guess this function advances the tracking in the gds_qp struct of the current producer index with the given send wr size?
| for(int j=0; j < swr_info.num_sge; j++) | ||
| { | ||
| gds_dbg("[%s] SGE=%d, Size ptr=0x%08x, Size=%d (0x%08x), +offset=%d\n", | ||
| gds_dbg("[%s] SGE=%d, Size ptr=00x%lx, Size=%d (0x%08x), +offset=%d\n", |
There was a problem hiding this comment.
There's a typo here (00x), and you might want to put debugging print changes in a separate patch, to make the review easier.
| gds_info->sge_list[i].ptr_to_size = (uintptr_t) &(sge->byte_count); | ||
| gds_info->sge_list[i].ptr_to_lkey = (uintptr_t) &(sge->key); | ||
| gds_info->sge_list[i].ptr_to_addr = (uintptr_t) &(sge->addr); | ||
| gds_info->sge_list[i].offset = 0; //why is that here? |
There was a problem hiding this comment.
what does the offset field stand for?
There was a problem hiding this comment.
@e-ago do you remember why you had offset in the first place?
There was a problem hiding this comment.
Where did you find this? Considering file https://github.com/gpudirect/libmlx5/blob/expose_send_params/src/qp.c the offset is modified here
qp->swr_info[qp->cur_swr].sge[qp->swr_info[qp->cur_swr].cur_sge].offset = offset;
and here
swr_info->sge_list[j].offset = qp->swr_info[i].sge[j].offset
There was a problem hiding this comment.
Is the offset field affecting the data written into the wqe in some way?
Co-Authored-By: yanivbl6 <yanivbl@mellanox.com>
drossetti
left a comment
There was a problem hiding this comment.
as a general preexisting remark, using mlx5dv is narrowing the scope of the whole library both to a single vendor and family of adapters.
Either in this change or in a later change in the expose_send_params branch, I think we need to:
- detect mlx5dv and define a macro
- implement the new APIs if the macro is defined, and return a clean error otherwise.
Makefile.am
Outdated
|
|
||
| .cu.o: | ||
| $(NVCC) $(CPPFLAGS) $(AM_CPPFLAGS) $(NVCCFLAGS) $(GENCODE_FLAGS) -c -o $@ $< | ||
| $(NVCC) $(CPPFLAGS) $(AM_LDFLAGS) $(AM_CPPFLAGS) $(NVCCFLAGS) $(GENCODE_FLAGS) -c -o $@ $< |
There was a problem hiding this comment.
-lmlx5 should not be needed here
| ( ((((v) & 0xffff0000U) >> 16) == GDS_API_MAJOR_VERSION) && \ | ||
| ((((v) & 0x0000ffffU) >> 0 ) >= GDS_API_MINOR_VERSION) ) | ||
|
|
||
| #define IBV_EXP_SEND_GET_INFO (1 << 28) |
There was a problem hiding this comment.
the problem here is that gds_send_wr is simply ibv_exp_send_wr.
maybe we want to have a new flags arg for gds_prepare_send().
| gds_info->sge_list[i].ptr_to_size = (uintptr_t) &(sge->byte_count); | ||
| gds_info->sge_list[i].ptr_to_lkey = (uintptr_t) &(sge->key); | ||
| gds_info->sge_list[i].ptr_to_addr = (uintptr_t) &(sge->addr); | ||
| gds_info->sge_list[i].offset = 0; //why is that here? |
There was a problem hiding this comment.
@e-ago do you remember why you had offset in the first place?
|
@yanivbl6 May I ask you to run again the |
|
I have reproduced the error, and will be looking into it. Edit: |
|
I fixed a critical bug (using wrong structure for send wqe), but I still get a validity error on the next iterations. |
|
I ran again the tests. When running while I there is no error when running |
|
I experienced errors with GPU-Memory as well. |
|
I've managed to pass the tests successfully by adding a MPI_Barrier(MPI_COMM_WORLD) in the validate section. I guess there is some race condition but not sure where it is coming from. |
|
Which version of CUDA and NVIDIA driver are you using? |
|
I was using CUDA9.0. I think the driver was 384.90, but not sure. I don't have access to GPU servers at the moment, I will try it when it is available again. I tried understanding why the MPI-Barrier may help but it coudln't. I didn't check the kernel though- is there a way to the dump the cqes polled the GPU? |
|
I deleted the previous comment. I noticed two errors, one related to the original
The validation error may be related to the fact that all the receive requests posted by As a reminder, I've tested everything using:
|
This code should be compared with the upstream/expose_send_params branch
The goal of those commits is to allow the changes in the "Expose send params" branch to work without the required changes to the mlx5 driver. This is done by using direct verbs to extract driver info.
The method to bypass the driver is:
I haven't yet managed to run all the tests in gda successfully (with or without this patch), but I was able to run:
Running gds_kernel_latency, peersync, descriptors, RC
Running gds_kernel_latency, peersync, descriptors, GMEM buffers, RC
without errors/hangs, after changing the hard coded number of batches in the test to 1.
@haggaie @bureddy @drossetti @e-ago