[Openais] Re: recent segfault

Daniel McNeil daniel at osdl.org
Tue Feb 1 16:28:24 PST 2005


On Tue, 2005-02-01 at 13:51, Steven Dake wrote:
> On Tue, 2005-02-01 at 14:27, Mark Haverkamp wrote:
> > On Tue, 2005-02-01 at 14:19 -0700, Steven Dake wrote:
> > > On Tue, 2005-02-01 at 14:03, Mark Haverkamp wrote:
> > > > On Tue, 2005-02-01 at 13:48 -0700, Steven Dake wrote:
> > > > > I was thinking another possibility is that after a processor joins a
> > > > > configuration, it takes the end of previous fragment from another
> > > > > processor into its assembly area.  Instead it should start on the next
> > > > > fragment start and discard any previous fragmented data from new
> > > > > processors.
> > > > 
> > > > I think that I see.  What you are saying is that a partial message was
> > > > sent before the processor joined and once it joined it received the last
> > > > piece.  
> > > > > 
> > > > > I think what we need is some kind of value in each message (short int)
> > > > > which specifies the index in msg_lens[x] where the first fragment starts
> > > > > for this packet, or 0xffff if this fragment contains no starting
> > > > > fragment.
> > > > 
> > > > Maybe, along with the fragmented bit (last message is fragment) add a
> > > > continuation bit (first part of buffer is continuation of a previous
> > > > message.  The receiving processor would throw away continuations if its
> > > > assembly area didn't already have something in it.
> > > > 
> > > This is good.  I want to be sure we can handle large MTUs for messages. 
> > > This means we need about a range of 0-3000 to specify the start index (2
> > > bytes, plus 1 byte per message with MTU of 9000).  I'll start working on
> > > a patch integrating the fragment bit and continuation bit into the start
> > > index to compact some space.
> > 
> > I'm not following the need for extra bytes. Wouldn't we only need a
> > single bit in the mcast structure like the fragmented bit?  The only
> > message in the incoming buffer that can be a continuation is the first
> > one.  If the assembly index is zero and the continuation bit is set on
> > the incoming message, we just throw away the first message in the
> > incoming buffer and the next one (if any) is the start of a new one.
> > 
> 
> good idea Mark.  The patch should be pretty easy to develop.  I'm
> looking at the sort queue in use bug now.  If you want to work up a
> patch for the continuation bit idea that would be cool.
> 
> It looks like if a message is lost in recovery,
> memb_state_operational_enter may sometimes be called in certain
> conditions after about 1-2 hours of running with RANDOM_DROP enabled. 
> This would definately result in a crash because there would be missing
> messages in the message stream which a) doesn't follow vs sematics b)
> would break the assembler.
> 


Steve,

The handling of the packed and fragment handling makes me think of a
potential problem:

If a config change happens in the middle of a large message that has
been fragmented, I'm wondering if the ordering of messages might
be messed up:

Starting with a 2 node cluster (A and B)

A sends out A1

B sends out B1frag1

C joins cluster and sends out C1

A sends out A2

B sends out B1frag2 and B1frag3

I think the above describes what you and Mark are talking about
where C can see the B1frag2 and B1frag3 and not know how to process
it.  Am I understanding this right?

Now the problem: what is the actual message deliver order:

A sees A1,C1,A2,B1
B sees A1,C1,A2,B1
C sees C1,A2 (with mark's fix to drop partial fragments).

So I see 2 problems with this:

1. B1 was started in the old config (A,B) but delivered in the new
    config (A,B,C)

2. C does not see B1 at all, since he only received partial fragments.

Am I mis-understanding the way it works?  If B does not deliver the
entire message B1, before C joins, then we can get the above problems.
Does the protocol give the surviving nodes a change to send out their
last message in its entirety before allowing a new node to join?

Thanks,

Daniel




More information about the Openais mailing list