[Openais] Logs during reconfiguration (node lost)

Mark Haverkamp markh at osdl.org
Mon Feb 21 10:28:04 PST 2005


On Mon, 2005-02-21 at 12:19 -0500, Kristen Smith wrote:
> Hi Steve,
> 
> We had some traffic running this weekend (5+1) and one of the nodes
> died (the same aisexec: ../include/sq.h:152: sq_item_get: Assertion
> `sq_position >= 0' failed. that is already reported). In looking
> through the logs when this happened, I am confused about something and
> maybe you can clear this up for me.
> 
> We had 6 nodes (47.104.22.82 - 47.104.22.87) - the failure occurred
> on .84. The reconfig looks the same on 4 of the remaining nodes and
> different on another one. The logs are shown below. 
> 
> My questions are:
> 
> 1) why do all but .86 think that .84 AND .86 went away - .84 died, so
> that makes sense, but why .86 as well? 
> 2) why does .86 think all other nodes went away and it is all by
> itself? 
> 3) both .82 and .86 think they are the rep and create new commit
> tokens - I guess this is because .86 thinks it is in a cluster by
> itself and .82 was the original rep.
> 
> Also, this is just the beginning of the reconfiguration at this time -
> all nodes do multiple reconfigurations after this one caused by the
> failure. I can send all logs along later if you want. Eventually
> (within a second or so after this initial reconfig), all the nodes
> wind up seeing each other and the ring is reformed in a 5+0 scenario.

I have also seen this kind of thing happen.  It seemed to me that given
that the nodes have timeouts associated that can time out at slightly
different times, and there are states that the protocol can be in where
foreign messages are ignored, that because of timing, things like this
can happen.  Maybe Steve will have a less ambiguous explanation than
this :-).

Mark.

p.s. It's a little difficult to see the relationship of the log messages
when the clocks look like they aren't synchronized.


-- 
Mark Haverkamp <markh at osdl.org>




More information about the Openais mailing list