[Openais] patch AMF sync

Hans Feldt Hans.Feldt at ericsson.com
Fri Aug 25 01:05:09 PDT 2006


The AMF sync and node fail over functionality that we at the moment are 
designing, are highly dependent of correct contents of the joined and 
left lists given in the configuration change callback. E.g. if a node 
leaves the cluster, we use the left list to determine which node it was.

When a node leaves, AMF needs to convert the node ID (from the left 
list) via a node name into a node object. This has been difficult since 
TOTEM seems to have forgotten all about the node. Current solution saves 
the TOTEM node ID in the node object, when a node leaves, a search for 
this node ID is done to find the node object and do the fail over.

What you are saying and what I have found is that a node can leave and 
join the cluster without left and joined lists indicating that.

The question is if that is possible in a real system? Lets say an init 
script restarts aisexec if it exists abnormally, that could perhaps 
cause this behaviour. The current AMF design can not handle this and 
would panic, other services cannot handle it either I guess?

Maybe it should be an openais requirement that the init script should 
reboot the whole node if aisexec dies? That would give correct left lists.

I suggest we stick with the current solution (using left/joined lists) a 
while and stabilize the rest of the design instead, what do you think?

Regards,
Hans

Steven Dake wrote:
> Hans,
> 
> Yes the member list _and_ the ring id are the only suitable variables to
> be trusted when making sync decisions.
> 
> How do you tell the difference between a processor that has just
> started, and one that left and then rejoins immediately before it is
> detected faulty?  The joining node does indeed send a join message on
> start.  The problem is, there is no history associated with the join
> list.  The only history stored in the system in the ring identifier
> which is a 64 bit number.
> 
> Consider the example you gave, where the processor joins, then restarts
> before the protocol gives a "left" message.  What should happen in this
> case?  Should a configuration change occur for the left processor in the
> transitional configuration, and then a configuration change occur for
> the same joined processor in the regular configuration?  This seems
> cumbersome and is somewhat difficult for me to think about a solution to
> implement.
> 
> This was discussed at some length previously which is why I'd prefer to
> just remove the joined and left from the configuration change message
> entirely.  Unfortunately some of the services have dependencies on these
> joins and leave lists so if you use them, use them wisely, ie: sync
> based upon ring ids not upon the joining or leaving processors.
> 
> In the long term future the configuration change message will still be
> available but mostly unused and change to just a member list, instead of
> joined/left list.  Short term, I suggest using the previous ring id
> which will be unique for each configuration.
> 
> Regards
> -steve
> 
> On Tue, 2006-08-15 at 07:38 +0200, Hans Feldt (AS/EAB) wrote:
>> Are you saying that a service preferably should have a sync protocol
>> that is independent of the joined and left lists in config change? The
>> only list to be trusted is the member list, is that correct?
>>
>> I thought the joining node sent some multicast message saying here I am,
>> please join me if there is a ring out there? Isn't that enough for the
>> other nodes to understand it has been away?
>>
>> Regards,
>> Hans
>>
>>> -----Original Message-----
>>> From: Steven Dake [mailto:sdake at redhat.com] 
>>> Sent: den 15 augusti 2006 02:21
>>> To: Hans Feldt (AS/EAB)
>>> Cc: openais at lists.osdl.org
>>> Subject: RE: [Openais] patch AMF sync
>>>
>>> On Mon, 2006-08-14 at 21:33 +0200, Hans Feldt (AS/EAB) wrote:
>>>> I am performing some hardening of the AMF sync at the 
>>> moment and have 
>>>> fixed a couple of issues. Assert is my friend.
>>>>
>>>> One assert I get is when I kill a node and start it again 
>>> directly. I 
>>>> do get config change callbacks in the other nodes but they 
>>> say no node 
>>>> left and no node joined! Isn't that strange?
>>>>
>>> This is proper behavior.  What happens is a node fails 
>>> (ctrl-c?), and then restarts.  When a node restarts, it 
>>> starts the membership protocol.
>>> Therefore, it appears as though the node never left or joined.
>>>
>>> In fact, there is no way to tell if a node has left or 
>>> joined, its more of a "here is a list of the processors in 
>>> the configuration".  The left/joined are misnomers and should 
>>> probably be removed, but several people complained when I 
>>> last mentioned it.
>>>
>>> The bottom line is, after every configuration change you must 
>>> do a complete resync of the data.  How do you know who should 
>>> do a resync?
>>> The ring id can be used to identify unique ring 
>>> configurations (and could I suppose be used to determine a 
>>> left and joined list in some strange way).  The way I'd 
>>> suggest this being done is that every processor that gets a 
>>> configuration change check its ringid.rep field to see if it 
>>> matches this_ip.  If it does, then have that node synchronize 
>>> the data for that part of the ring.
>>>
>>> This could be extended into the sync code so that the sync 
>>> callbacks are only called for nodes that are ring reps, but 
>>> some services don't synchronize in this way.  Therefore it 
>>> would be some work to make changes to them to work in this fashion.
>>>
>>> Regards
>>> -steve
>>>
>>>> Regards,
>>>> Hans
>>>>
>>>>> -----Original Message-----
>>>>> From: openais-bounces at lists.osdl.org 
>>>>> [mailto:openais-bounces at lists.osdl.org] On Behalf Of Hans Feldt
>>>>> Sent: den 11 augusti 2006 14:34
>>>>> To: sdake at redhat.com
>>>>> Cc: openais at lists.osdl.org
>>>>> Subject: Re: [Openais] patch AMF sync
>>>>>
>>>>> Steven Dake wrote:
>>>>>
>>>>>> 10) I suggest using the regular openais_timer_add functions
>>>>> instead of
>>>>>> poll_timer_add.  If these functions have problems (which I
>>>>> think have
>>>>>> been addressed now) then I'd like to know about them so 
>>> they can 
>>>>>> be fixed.  the poll timer add should only be used by totem.
>>>>> I tried again to use the openais_timer interface but this 
>>> time totem 
>>>>> locked up and cluster communication did not work, I got a split 
>>>>> brain cluster...
>>>>>
>>>>> Initially the nodes see each other, one node syncs the other but 
>>>>> after that we got the split brain.
>>>>>
>>>>> Therefore AMF still uses the poll_timer interface.
>>>>>
>>>>> My test environment is a 3 node User mode Linux cluster. I have 
>>>>> _not_ tried with a real cluster.
>>>>>
>>>>> Regards,
>>>>> Hans
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Openais mailing list
>>>>> Openais at lists.osdl.org
>>>>> https://lists.osdl.org/mailman/listinfo/openais
>>>>>
>>>
> 
> 




More information about the Openais mailing list