[Openais] Re: recent segfault

Steven Dake sdake at mvista.com
Tue Feb 1 12:48:05 PST 2005


I was thinking another possibility is that after a processor joins a
configuration, it takes the end of previous fragment from another
processor into its assembly area.  Instead it should start on the next
fragment start and discard any previous fragmented data from new
processors.

I think what we need is some kind of value in each message (short int)
which specifies the index in msg_lens[x] where the first fragment starts
for this packet, or 0xffff if this fragment contains no starting
fragment.

Does this scenario match the configuration change you saw?  I think for
this kind of crash to happen, you would have to see a crash on the
joining processor.

Thoughts?
-steve

On Tue, 2005-02-01 at 13:16, Mark Haverkamp wrote:
> On Tue, 2005-02-01 at 12:54 -0700, Steven Dake wrote:
> > I have an idea on this issue that I was planning to look at yesterday
> > but a few things came up.
> > 
> > The basic idea is that the following scenario happens:
> > 1. fragmented message in assembly area
> > 2. configuration change builds configuration without processor that has
> > fragmented message data in assembly area
> > 3. some messages get sent
> > 4. new configuration change builds configuration with processor that has
> > fragmented message data in assembly area
> > 5. next fragment delivered to message which doesn't apply to that
> > message fragment.
> > 
> > This is just a guess.  There may be some other variation on this idea
> > that is the cause.
> 
> I kind of thought about that too.  That's why I sent you the patch for
> cleaning out the assembly area on nodes that leave.  In this case,
> though, it didn't look like anybody had left.
> 
> Mark.
> 
> 
> > 
> > Thanks
> > -steve
> 
> > Openais at lists.osdl.org
> > http://lists.osdl.org/mailman/listinfo/openais




More information about the Openais mailing list