[Openais] patch AMF sync
Hans Feldt (AS/EAB)
hans.feldt at ericsson.com
Mon Aug 14 22:38:05 PDT 2006
Are you saying that a service preferably should have a sync protocol
that is independent of the joined and left lists in config change? The
only list to be trusted is the member list, is that correct?
I thought the joining node sent some multicast message saying here I am,
please join me if there is a ring out there? Isn't that enough for the
other nodes to understand it has been away?
Regards,
Hans
> -----Original Message-----
> From: Steven Dake [mailto:sdake at redhat.com]
> Sent: den 15 augusti 2006 02:21
> To: Hans Feldt (AS/EAB)
> Cc: openais at lists.osdl.org
> Subject: RE: [Openais] patch AMF sync
>
> On Mon, 2006-08-14 at 21:33 +0200, Hans Feldt (AS/EAB) wrote:
> > I am performing some hardening of the AMF sync at the
> moment and have
> > fixed a couple of issues. Assert is my friend.
> >
> > One assert I get is when I kill a node and start it again
> directly. I
> > do get config change callbacks in the other nodes but they
> say no node
> > left and no node joined! Isn't that strange?
> >
> This is proper behavior. What happens is a node fails
> (ctrl-c?), and then restarts. When a node restarts, it
> starts the membership protocol.
> Therefore, it appears as though the node never left or joined.
>
> In fact, there is no way to tell if a node has left or
> joined, its more of a "here is a list of the processors in
> the configuration". The left/joined are misnomers and should
> probably be removed, but several people complained when I
> last mentioned it.
>
> The bottom line is, after every configuration change you must
> do a complete resync of the data. How do you know who should
> do a resync?
> The ring id can be used to identify unique ring
> configurations (and could I suppose be used to determine a
> left and joined list in some strange way). The way I'd
> suggest this being done is that every processor that gets a
> configuration change check its ringid.rep field to see if it
> matches this_ip. If it does, then have that node synchronize
> the data for that part of the ring.
>
> This could be extended into the sync code so that the sync
> callbacks are only called for nodes that are ring reps, but
> some services don't synchronize in this way. Therefore it
> would be some work to make changes to them to work in this fashion.
>
> Regards
> -steve
>
> > Regards,
> > Hans
> >
> > > -----Original Message-----
> > > From: openais-bounces at lists.osdl.org
> > > [mailto:openais-bounces at lists.osdl.org] On Behalf Of Hans Feldt
> > > Sent: den 11 augusti 2006 14:34
> > > To: sdake at redhat.com
> > > Cc: openais at lists.osdl.org
> > > Subject: Re: [Openais] patch AMF sync
> > >
> > > Steven Dake wrote:
> > >
> > > > 10) I suggest using the regular openais_timer_add functions
> > > instead of
> > > > poll_timer_add. If these functions have problems (which I
> > > think have
> > > > been addressed now) then I'd like to know about them so
> they can
> > > > be fixed. the poll timer add should only be used by totem.
> > >
> > > I tried again to use the openais_timer interface but this
> time totem
> > > locked up and cluster communication did not work, I got a split
> > > brain cluster...
> > >
> > > Initially the nodes see each other, one node syncs the other but
> > > after that we got the split brain.
> > >
> > > Therefore AMF still uses the poll_timer interface.
> > >
> > > My test environment is a 3 node User mode Linux cluster. I have
> > > _not_ tried with a real cluster.
> > >
> > > Regards,
> > > Hans
> > >
> > >
> > > _______________________________________________
> > > Openais mailing list
> > > Openais at lists.osdl.org
> > > https://lists.osdl.org/mailman/listinfo/openais
> > >
>
>
More information about the Openais
mailing list