Introducing multiple send/recv memory buffers in gds_kernel_latency#78
Introducing multiple send/recv memory buffers in gds_kernel_latency#78e-ago wants to merge 2 commits intoexpose_send_paramsfrom
Conversation
| int exp_send_info; | ||
| int validate; | ||
| char *validate_buf; | ||
| int buf_size; |
There was a problem hiding this comment.
white space change: can you not change the alignment of the whole struct ?
There was a problem hiding this comment.
I am still seeing a lot of white space noise, which makes hard to understand which fields are new
tests/gds_kernel_latency.c
Outdated
| int validate; | ||
| char *validate_buf; | ||
| int buf_size; | ||
| int buf_align; |
There was a problem hiding this comment.
I think this var name is incorrect, it suggests alignment requirements, but instead it is an aligned size.
There was a problem hiding this comment.
did you change the name of this member var ?
tests/gds_kernel_latency.c
Outdated
| ctx->txbufexp_addr = NULL; | ||
|
|
||
| size_t alloc_size = max_batch_len * align_to(size + 40, page_size); | ||
| size_t alloc_size = max_batch_len * ctx->buf_align; |
There was a problem hiding this comment.
since you are using memalign/posix_memalign below, why do you need buf_align in the first place?
it should not be needed, as those allocators already provide buffers with the right size.
There was a problem hiding this comment.
This is from previous version of the code. I'll remove it
There was a problem hiding this comment.
Should I keep using size + 40?
There was a problem hiding this comment.
+40 is the padding required for the UD protocol.
You can removed it if we make sure UD cannot be selected in this test.
tests/gds_kernel_latency.c
Outdated
| } else { | ||
| ctx->txbuf = memalign(page_size, ctx->txtot_size); //posix_memalign | ||
| ctx->rxbuf = memalign(page_size, ctx->rxtot_size); | ||
| assert(0 == posix_memalign((void **)&(ctx->txbuf), page_size, ctx->txtot_size)); |
There was a problem hiding this comment.
why switching from memalign to posix_memalign ?
There was a problem hiding this comment.
because memalign is obsolete https://linux.die.net/man/3/memalign
There was a problem hiding this comment.
FYI judging from https://github.com/linux-rdma/rdma-core/blob/1cf909a14b3d07c8a301e3de03bfb91e62aaeff5/libibverbs/examples/ud_pingpong.c#L309 it looks like memalign is still being used.
Here we forked that code, so it is up to us.
Note that switching to posix_memalign() brings a bit different requirements wrt memalign():
"The function posix_memalign() allocates size bytes and places the address of the allocated memory in *memptr. The address of the allocated memory will be a multiple of alignment, which must be a power of two and a multiple of sizeof(void *)."
Also note that buffer size is not checked to be a multiple of alignment.
tests/gds_kernel_latency.c
Outdated
| } else { | ||
| ctx->txbuf = memalign(page_size, ctx->txtot_size); //posix_memalign | ||
| ctx->rxbuf = memalign(page_size, ctx->rxtot_size); | ||
| assert(0 == posix_memalign((void **)&(ctx->txbuf), page_size, ctx->txtot_size)); |
There was a problem hiding this comment.
FYI judging from https://github.com/linux-rdma/rdma-core/blob/1cf909a14b3d07c8a301e3de03bfb91e62aaeff5/libibverbs/examples/ud_pingpong.c#L309 it looks like memalign is still being used.
Here we forked that code, so it is up to us.
Note that switching to posix_memalign() brings a bit different requirements wrt memalign():
"The function posix_memalign() allocates size bytes and places the address of the allocated memory in *memptr. The address of the allocated memory will be a multiple of alignment, which must be a power of two and a multiple of sizeof(void *)."
Also note that buffer size is not checked to be a multiple of alignment.
tests/gds_kernel_latency.c
Outdated
| ctx->txbuf = memalign(page_size, ctx->txtot_size); //posix_memalign | ||
| ctx->rxbuf = memalign(page_size, ctx->rxtot_size); | ||
| assert(0 == posix_memalign((void **)&(ctx->txbuf), page_size, ctx->txtot_size)); | ||
| assert(0 == posix_memalign((void **)&(ctx->rxbuf), page_size, ctx->rxtot_size)); |
There was a problem hiding this comment.
careful with assert() as it is a no-op if NDEBUG is defined.
There was a problem hiding this comment.
I would rather not do this change in this PR, as it is not essential and unrelated to the prototype
There was a problem hiding this comment.
Ok so I'll come back to memalign removing the posix_memalign
tests/gds_kernel_latency.c
Outdated
|
|
||
| if(ctx->exp_send_info == 1) | ||
| { | ||
|
|
There was a problem hiding this comment.
please remove this whitespace change and the others below as they simply add noise
tests/gds_kernel_latency.c
Outdated
| if( ctx->exp_send_info == 1 ) | ||
| { | ||
| free(ctx->txbufexp); | ||
| free(ctx->txbufexp); |
There was a problem hiding this comment.
unneeded whitespace change
tests/gds_kernel_latency.c
Outdated
| free(ctx->txbufexp_addr); | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
why did you move ibv_close_device here?
There was a problem hiding this comment.
it's an oversight, I was debugging a free error
| gpu_warn("[%d] Could not post all receive, requested %d, actually posted %d\n", my_rank, max_batch_len, nrecv); | ||
| return 1; | ||
| } | ||
| // int nrecv = pp_post_recv(ctx, max_batch_len); |
There was a problem hiding this comment.
where is pp_post_recv() being called now?
There was a problem hiding this comment.
Inside pp_post_work before starting the main loop for (i = 0; i < posted_recv; ++i). My initial question was: why there is this additional pp_post_recv outside and before the pp_post_work ?
|
@drossetti Changes in this PR:
Also: what's the purpose of |
|
@drossetti ping |
tests/gds_kernel_latency.c
Outdated
| int validate; | ||
| char *validate_buf; | ||
| int buf_size; | ||
| int buf_align; |
There was a problem hiding this comment.
did you change the name of this member var ?
| int exp_send_info; | ||
| int validate; | ||
| char *validate_buf; | ||
| int buf_size; |
There was a problem hiding this comment.
I am still seeing a lot of white space noise, which makes hard to understand which fields are new
| ctx->txbufexp_size = (uint32_t*)gpu_malloc(page_size, sizeof(uint32_t)*max_batch_len); | ||
| ctx->txbufexp_lkey = (uint32_t*)gpu_malloc(page_size, sizeof(uint32_t)*max_batch_len); | ||
| ctx->txbufexp_addr = (uintptr_t*)gpu_malloc(page_size, sizeof(uintptr_t)*max_batch_len); | ||
| ctx->txbufexp = gpu_malloc(page_size, ctx->txtot_size); |
There was a problem hiding this comment.
for the UD requirement, don't you need +40 even here?
There was a problem hiding this comment.
I suppose that ctx->txtot_size now includes the additional 40B, right?
| ctx->txbufexp_size[i] = ctx->buf_sizeexp; | ||
| ctx->txbufexp_lkey[i] = ctx->mrexp->lkey; | ||
| ctx->txbufexp_addr[i]=(uintptr_t)(ctx->txbufexp+(i*ctx->size_align)); | ||
| gpu_info("exp_send_info - hi=%d, ostmem: new tx size: %d instead of %d. New tx addr: %lx instead of %lx\n", |
There was a problem hiding this comment.
"ostmem" probably missing an 'h'
|
|
||
| for(i=0; i < max_batch_len; i++) | ||
| { | ||
| if (ctx->gpumem) { |
There was a problem hiding this comment.
I find it hard to follow the logic here.
Could you explain why this for loop has that if (ctx->gpumem) ?
@drossetti
rx_flag? Can I remove it?pp_post_recvat line 1541. The firstpp_post_recvare re-posted inpp_post_work