[Openais] [PATCH 2/2] Avoid copying buffers where possible in the Infiniband driver

Zane Bitter zane.bitter at gmail.com
Tue Mar 29 21:00:37 PDT 2011


On 2011/03/29, at 17:18, Steven Dake wrote:

> On 03/28/2011 09:06 PM, Zane Bitter wrote:
>> 
>> On 2011/03/28, at 02:31, Steven Dake wrote:
>> 
>>> 
>>> Yes, the receive buffers handling was changed by your patch.  What
>>> happens if a recv buffer is delivered to totemsrp?  That recv buffer is
>>> put back in the "send buffers free list".  Prior to this change, the
>>> recv buffer would be posted back to the recv queue via ibv_post_recv.
>>> totemsrp doesn't know anything about recv buffer vs send buffer, and I'd
>>> prefer to keep the complexities of rdma out of totemsrp.c.
>>> 
>>> Regards
>>> -steve
>> 
>> OK, that's true, I am assuming that anything passed to iba_mcast_flush_send() or iba_mcast_noflush_send() is actually a send_buf (which is wrong!). I did a hunt through the code to trace back where each buffer that is sent to either of these functions is allocated. Full details are below, but the executive summary:
>> 
>> * buffers passed to totemrrp_token_send() are allocated on the stack or in the totemsrp instance.
>>  - I think we already decided to keep doing the memcpy() in the iba driver for these
>> * buffers passed to totemrrp_mcast_flush_send() are allocated on the stack.
>>  - This makes my patch wrong - we should probably continue to do a memcpy() in the iba driver for these too
>> * buffers passed to totemrrp_mcast_noflush_send() are ultimately allocated by totemsrp_buffer_alloc()
>> 
>> I definitely agree that passing a receive buffer to iba_mcast_noflush_send() would be bad, but if it's happening I can't find where. Intuitively, this seems plausible to me: if we receive a packet we are not going to just transmit it straight away; we need to make a copy in case we need to retransmit it again later, and the net driver will want its buffer back as soon as the deliver call up to totemsrp is finished (in the iba case it gets reposted to the receive queue immediately once iba_deliver_fn() returns).
>> 
> 
> When a message is received via totem, it triggers:
> static int mcast_cq_recv_event_fn (hdb_handle_t poll_handle,  int
> events,  int suck,  void *context)
> 
> This causes the recv buffer to be posted back to the hardware recv
> buffer list:
> mcast_recv_buf_post in the same funtion

Yep, that's still there:
        res = ibv_poll_cq (instance->mcast_recv_cq, 64, wc);
        if (res > 0) {
                for (i = 0; i < res; i++) {
                        iba_deliver_fn (instance, wc[i].wr_id, wc[i].byte_len);
                        mcast_recv_buf_post (instance, wrid2void(wc[i].wr_id)); // <<< This one
                }
        }

And it still posts the receive buffers back to the queue:
static inline int mcast_recv_buf_post (struct totemiba_instance *instance, struct recv_buf *recv_buf)
{
        struct ibv_recv_wr *fail_recv;
        int res;

        res = ibv_post_recv (instance->mcast_cma_id->qp, &recv_buf->recv_wr, &fail_recv);

        return (res);
}


> In your patches, when a message is received via totem, it is delivered,
> and then freed via totemsrp.c:messages_free() (which adds it to a list
> of free messages for the send queue) but never posts it back to the recv
> buffer.

totemsrp.c:messages_free() is freeing the messages that are stored in the regular_sort_queue in the totemsrp instance. Per yesterday's analysis (below), all of those appear to originate from a totemsrp_buffer_alloc() allocation (i.e. they are all transmit buffers).

Are we looking at the same code? I have the patches applied on top of b4bef1cb

cheers,
Zane.

> Regards
> -steve
> 
>> Tracing (somewhat less exhaustively) in the other direction, through the deliver callbacks, I see the following things happening to the receive buffers:
>> message_handler_orf_token() copied to the stack, sometimes retransmitted from there
>> message_handler_mcast() copied to a buffer allocated with totemsrp_buffer_alloc() and that buffer added to a sort queue
>> message_handler_memb_merge_detect() copied to the stack
>> message_handler_memb_join() passed to memb_join_process(), no transmit
>> message_handler_memb_commit_token() copied to the stack, sometimes retransmitted from there
>> message_handler_token_hold_cancel() nothing
>> 
>> But if I am still missing something, I would be _very_ happy to find out where it is ;)
>> 
>> thanks!
>> Zane.
>> 
>> 
>> 
>> totemrrp_token_send()
>> - totemsrp.c:2585 instance->orf_token_retransmit
>> - totemsrp.c:2669 passed to token_send()
>> - totemsrp.c:2848 passed to memb_state_commit_token_send_recovery()
>> - totemsrp.c:2876 instance->commit_token (== instance->commit_token_storage)
>> 
>> token_send()
>> - totemsrp.c:2741 <stack>
>> - totemsrp.c:3619 <stack>
>> 
>> memb_state_commit_token_send_recovery()
>> - totemsrp.c:1992 passed to memb_state_recovery_enter()
>> 
>> memb_state_recovery_enter()
>> - totemsrp.c:4309 alloca()
>> 
>> 
>> totemrrp_mcast_flush_send()
>> - totemsrp.c:2701 <stack>
>> - totemsrp.c:3011 <stack>
>> - totemsrp.c:3081 <stack>
>> - totemsrp.c:3101 <stack>
>> 
>> 
>> totemrrp_mcast_noflush_send()
>> - totemsrp.c:2273 instance->regular_sort_queue
>> - totemsrp.c:2273 instance->recovery_sort_queue
>> - totemsrp.c:2424 instance->retrans_message_queue
>> - totemsrp.c:2424 instance->new_message_queue
>> 
>> instance->regular_sort_queue
>> - totemsrp.c:1691 instance->recovery_sort_queue
>> - totemsrp.c:1790 instance->recovery_sort_queue
>> - totemsrp.c:2422 instance->new_message_queue
>> - totemsrp.c:3869 totemsrp_buffer_alloc()
>> 
>> instance->recovery_sort_queue
>> - totemsrp.c:2422 instance->retrans_message_queue
>> - totemsrp.c:3869 totemsrp_buffer_alloc()
>> 
>> instance->retrans_message_queue
>> - totemsrp.c:2127 totemsrp_buffer_alloc()
>> 
>> instance->new_message_queue
>> - totemsrp.c:2214 totemsrp_buffer_alloc()
>> 
> 



More information about the Openais mailing list