[Openais] Re: recent segfault

Steven Dake sdake at mvista.com
Tue Feb 1 13:19:05 PST 2005


On Tue, 2005-02-01 at 14:03, Mark Haverkamp wrote:
> On Tue, 2005-02-01 at 13:48 -0700, Steven Dake wrote:
> > I was thinking another possibility is that after a processor joins a
> > configuration, it takes the end of previous fragment from another
> > processor into its assembly area.  Instead it should start on the next
> > fragment start and discard any previous fragmented data from new
> > processors.
> 
> I think that I see.  What you are saying is that a partial message was
> sent before the processor joined and once it joined it received the last
> piece.  
> > 
> > I think what we need is some kind of value in each message (short int)
> > which specifies the index in msg_lens[x] where the first fragment starts
> > for this packet, or 0xffff if this fragment contains no starting
> > fragment.
> 
> Maybe, along with the fragmented bit (last message is fragment) add a
> continuation bit (first part of buffer is continuation of a previous
> message.  The receiving processor would throw away continuations if its
> assembly area didn't already have something in it.
> 
This is good.  I want to be sure we can handle large MTUs for messages. 
This means we need about a range of 0-3000 to specify the start index (2
bytes, plus 1 byte per message with MTU of 9000).  I'll start working on
a patch integrating the fragment bit and continuation bit into the start
index to compact some space.

> > 
> > Does this scenario match the configuration change you saw?  I think for
> > this kind of crash to happen, you would have to see a crash on the
> > joining processor.
> > 
> 
> Things had been running just fine for about 3 hours, then there was a
> token timeout.  In the end, the configuration didn't change.  All four
> processors were still in the configuration.  Although, it was the
> processor that detected the token timeout in the first place that also
> got the segfault. 
> 
>  




More information about the Openais mailing list