[Openais] Split brain when using EVS library

Arne Eriksson R arne.r.eriksson at ericsson.com
Tue Sep 9 06:51:18 PDT 2008


The question is if EVS detects rejoin of nodes?
For example the "loosing side" in the merge could receive
EVS_ERR_BAD_HANDLE and be forced to re-connect. 
This can only be the case if EVS maintains some form of history of
previous ring formations. If EVS drops all information about membership
of earlier rings, then of course it can not help in arbitrating the
merger problem. 

Arne

> -----Original Message-----
> From: Robert Wipfel [mailto:RAWIPFEL at novell.com] 
> Sent: den 9 september 2008 14:12
> To: Arne Eriksson R; openais at lists.osdl.org
> Subject: Re: [Openais] Split brain when using EVS library
> 
> >>> On 9/9/2008 at  4:27 AM, in message
> <63E39ADA42BF8B49BEAE3666683A248407342715 at esealmw107.eemea.eri
csson.se>, "Arne
> Eriksson R" <arne.r.eriksson at ericsson.com> wrote:
> > Hi,
> > We have a cluster with 6 processors using openais stable 
> version 0.80.3.
> > 
> > For some reason our cluster splits up into two rings.
> > Scenario is:
> > node1(n1) n2 n3 n4 n5 n6 are in the ring.
> > 
> > Suddenly the ring splits into two rings:
> > n1 n2 n3 got leave msg from n4 n5 n6
> > n4 n5 n6 got leave msg from n1 n2 n3
> > 
> > After a few milliseconds the two rings joins again:
> > n1 n2 n3 got join msg from n4 n5 n6
> > n4 n5 n6 got join msg from n1 n2 n3
> > 
> > The two ring is joined to one ring again:
> > node1(n1) n2 n3 n4 n5 n6 are in the ring.
> > 
> > The question is if this is a normal scenario from EVS in the openais
> > implementation?
> > 
> > The problem is that the application needs to detect the difference
> > between two kinds of joins: The "normal" join where the two 
> rings/nodes
> > join for the first time and the "abnormal" joins where a 
> ring has split
> > and re-joined (without any nodes being restarted). The first case
> > typically requires only a sync of some nodes (bringing the 
> history up to
> > date). The second case requires a merger, i.e selection of a loosing
> > side and the looser discarding the loosers history.
> 
> Sidebar: if assuming the presence of a shared disk someplace, then it
> can be used as a different kind of communication channel; for 
> detecting
> Split Brain conditions:
> http://wiki.linux-ha.org/SBD_Fencing 
> The idea is for the partitions to share membership 
> information / detect
> that a partition exists. Just a thought - hopefully nothing 
> bad happened
> while the partitions were split - in the second case ;-)
> 
> Hth,
> Robert
> 
> 


More information about the Openais mailing list