[Openais] patch AMF sync

Hans Feldt (AS/EAB) hans.feldt at ericsson.com
Mon Aug 14 22:38:05 PDT 2006


Are you saying that a service preferably should have a sync protocol
that is independent of the joined and left lists in config change? The
only list to be trusted is the member list, is that correct?

I thought the joining node sent some multicast message saying here I am,
please join me if there is a ring out there? Isn't that enough for the
other nodes to understand it has been away?

Regards,
Hans

> -----Original Message-----
> From: Steven Dake [mailto:sdake at redhat.com] 
> Sent: den 15 augusti 2006 02:21
> To: Hans Feldt (AS/EAB)
> Cc: openais at lists.osdl.org
> Subject: RE: [Openais] patch AMF sync
> 
> On Mon, 2006-08-14 at 21:33 +0200, Hans Feldt (AS/EAB) wrote:
> > I am performing some hardening of the AMF sync at the 
> moment and have 
> > fixed a couple of issues. Assert is my friend.
> > 
> > One assert I get is when I kill a node and start it again 
> directly. I 
> > do get config change callbacks in the other nodes but they 
> say no node 
> > left and no node joined! Isn't that strange?
> > 
> This is proper behavior.  What happens is a node fails 
> (ctrl-c?), and then restarts.  When a node restarts, it 
> starts the membership protocol.
> Therefore, it appears as though the node never left or joined.
> 
> In fact, there is no way to tell if a node has left or 
> joined, its more of a "here is a list of the processors in 
> the configuration".  The left/joined are misnomers and should 
> probably be removed, but several people complained when I 
> last mentioned it.
> 
> The bottom line is, after every configuration change you must 
> do a complete resync of the data.  How do you know who should 
> do a resync?
> The ring id can be used to identify unique ring 
> configurations (and could I suppose be used to determine a 
> left and joined list in some strange way).  The way I'd 
> suggest this being done is that every processor that gets a 
> configuration change check its ringid.rep field to see if it 
> matches this_ip.  If it does, then have that node synchronize 
> the data for that part of the ring.
> 
> This could be extended into the sync code so that the sync 
> callbacks are only called for nodes that are ring reps, but 
> some services don't synchronize in this way.  Therefore it 
> would be some work to make changes to them to work in this fashion.
> 
> Regards
> -steve
> 
> > Regards,
> > Hans
> > 
> > > -----Original Message-----
> > > From: openais-bounces at lists.osdl.org 
> > > [mailto:openais-bounces at lists.osdl.org] On Behalf Of Hans Feldt
> > > Sent: den 11 augusti 2006 14:34
> > > To: sdake at redhat.com
> > > Cc: openais at lists.osdl.org
> > > Subject: Re: [Openais] patch AMF sync
> > > 
> > > Steven Dake wrote:
> > > 
> > > > 10) I suggest using the regular openais_timer_add functions
> > > instead of
> > > > poll_timer_add.  If these functions have problems (which I
> > > think have
> > > > been addressed now) then I'd like to know about them so 
> they can 
> > > > be fixed.  the poll timer add should only be used by totem.
> > > 
> > > I tried again to use the openais_timer interface but this 
> time totem 
> > > locked up and cluster communication did not work, I got a split 
> > > brain cluster...
> > > 
> > > Initially the nodes see each other, one node syncs the other but 
> > > after that we got the split brain.
> > > 
> > > Therefore AMF still uses the poll_timer interface.
> > > 
> > > My test environment is a 3 node User mode Linux cluster. I have 
> > > _not_ tried with a real cluster.
> > > 
> > > Regards,
> > > Hans
> > > 
> > > 
> > > _______________________________________________
> > > Openais mailing list
> > > Openais at lists.osdl.org
> > > https://lists.osdl.org/mailman/listinfo/openais
> > > 
> 
> 




More information about the Openais mailing list