[Openais] Re: recent segfault

Steven Dake sdake at mvista.com
Tue Feb 1 13:51:16 PST 2005


On Tue, 2005-02-01 at 14:27, Mark Haverkamp wrote:
> On Tue, 2005-02-01 at 14:19 -0700, Steven Dake wrote:
> > On Tue, 2005-02-01 at 14:03, Mark Haverkamp wrote:
> > > On Tue, 2005-02-01 at 13:48 -0700, Steven Dake wrote:
> > > > I was thinking another possibility is that after a processor joins a
> > > > configuration, it takes the end of previous fragment from another
> > > > processor into its assembly area.  Instead it should start on the next
> > > > fragment start and discard any previous fragmented data from new
> > > > processors.
> > > 
> > > I think that I see.  What you are saying is that a partial message was
> > > sent before the processor joined and once it joined it received the last
> > > piece.  
> > > > 
> > > > I think what we need is some kind of value in each message (short int)
> > > > which specifies the index in msg_lens[x] where the first fragment starts
> > > > for this packet, or 0xffff if this fragment contains no starting
> > > > fragment.
> > > 
> > > Maybe, along with the fragmented bit (last message is fragment) add a
> > > continuation bit (first part of buffer is continuation of a previous
> > > message.  The receiving processor would throw away continuations if its
> > > assembly area didn't already have something in it.
> > > 
> > This is good.  I want to be sure we can handle large MTUs for messages. 
> > This means we need about a range of 0-3000 to specify the start index (2
> > bytes, plus 1 byte per message with MTU of 9000).  I'll start working on
> > a patch integrating the fragment bit and continuation bit into the start
> > index to compact some space.
> 
> I'm not following the need for extra bytes. Wouldn't we only need a
> single bit in the mcast structure like the fragmented bit?  The only
> message in the incoming buffer that can be a continuation is the first
> one.  If the assembly index is zero and the continuation bit is set on
> the incoming message, we just throw away the first message in the
> incoming buffer and the next one (if any) is the start of a new one.
> 

good idea Mark.  The patch should be pretty easy to develop.  I'm
looking at the sort queue in use bug now.  If you want to work up a
patch for the continuation bit idea that would be cool.

It looks like if a message is lost in recovery,
memb_state_operational_enter may sometimes be called in certain
conditions after about 1-2 hours of running with RANDOM_DROP enabled. 
This would definately result in a crash because there would be missing
messages in the message stream which a) doesn't follow vs sematics b)
would break the assembler.

Thanks
-steve

> 
> Mark.
> 
> > >  




More information about the Openais mailing list