[Openais] Re: confchg_fn, cluster membership, etc.

Mark Haverkamp markh at osdl.org
Fri Sep 10 13:48:13 PDT 2004


On Fri, 2004-09-10 at 13:30, Steven Dake wrote:
> On Fri, 2004-09-10 at 10:23, Mark Haverkamp wrote:
> > Steve,
> > 
> > I've been looking at the configuration change function and what I get
> > when it is called.
> > 
> > When I start the first aisexec, I see that I am the only node.  This
> > makes sense.
> > 
> > When I start a second aisexec, It first sees itself as the only node in
> > the cluster.  (Does this mean that for a short time there are two
> > clusters?) Then the config function gets called again and I see that the
> > first node joined.  The first node on the other hand sees that the
> > second node joined (which seems to be the correct view).  I would think
> > that each node should see the same view of the cluster with regard to
> > who is joining and who was already a member.  Is it possible to have
> > each node have the same idea of who has joined and who was already a
> > member?
> > 
> 
> There is a good reason there are two configuration changes.  The first
> configuration (called a transistional configuration) indicates who has
> left the configuration.  The second configuration (called the regular
> configuration) specifies who has joined the configuration.

Are these transitions visible via the config change function?  What I
see on the second node that I start is this:


L(4): AIS Executive Service: Copyright (C) 2002-2004 MontaVista
Software, Inc.
L(4): entering GATHER state.
L(4): SENDING attempt join because this node is ring rep.
New queue for ip 192.168.1.17
L(5): Evt exec init request
L(4): AIS Executive Service: started and ready to receive connections.
L(4): Got attempt join from 192.168.1.8
L(4): CONSENSUS reached!
Got membership form token
Got membership form token
conf_desc_list 2
highest seq 0 0
highest seq 1 0
setting barrier seq to 1
EVS STATE group arut 0 gmi arut 0 highest 0 barrier 1 starting group
arut 0
EVS STATE group arut 1 gmi arut 1 highest 0 barrier 1 starting group
arut 1
L(4): EVS recovery of messages complete, transitioning to operational.
CONFCHG ENTRIES 1
L(4): CLM CONFIGURATION CHANGE
L(4): New Configuration:
L(4):   192.168.1.17
L(4): Members Left:
L(4): Members Joined:
L(5): Evt conf change
L(5): m 1, j 0, l 0
New queue for ip 192.168.1.8
L(4): CLM CONFIGURATION CHANGE
L(4): New Configuration:
L(4):   192.168.1.8
L(4):   192.168.1.17
L(4): Members Left:
L(4): Members Joined:
L(4):   192.168.1.8
L(5): Evt conf change
L(5): m 2, j 1, l 0
L(4): got nodejoin message 192.168.1.17
L(4): got nodejoin message 192.168.1.8
L(3): Token being retransmitted.
L(3): Token loss in OPERATIONAL.
L(4): entering GATHER state.
L(4): SENDING attempt join because this node is ring rep.
L(4): I am the only member.
L(4): CLM CONFIGURATION CHANGE
L(4): New Configuration:
L(4):   192.168.1.17
L(4): Members Left:
L(4):   192.168.1.8
L(4): Members Joined:
L(5): Evt conf change
L(5): m 1, j 0, l 1

 
The first one says that there are no joiners, the second one shows
joiners.  The final is when I killed the first node.
 


> 
> When a partition is detected, all messages that are part of the old
> configuration are delivered.  When a gap is detected in sequence
> numbers, a transistional configuration is delivered, and then the
> remaining messages that can be delivered are delivered.  Then the
> regular configuration is delivered.  New messages are then delivered
> under the new regular configuration.

Are you saying that there can be more than one cluster?  I would have
thought that there is only one cluster and that if you weren't in it
before but you are in it now, that you are the new guy and just joined.

> 
> This ensures that messages are delivered under the correct
> configuration.
> 
> Philosophically I dont think its possible to specify, atleast with the
> current vs messaging model, who has joined because of observer
> relativity (see below).  The reason is that two partitions each with 4
> processors could be operating seperately, and then merge.  So who would
> be joining partition, and who would be the partition that was joined?

How do you insure data integrity if two partitions can operation
independently?  What is to stop them from stomping on each other because
they are unaware?  For instance if the ais lock service gets
implemented, You can't have the two halves of the cluster think that
they can take ownership of the same locks.

> 
> > For a function of the event service, I'd like to know if I'm the new
> > guy.  The way things are I don't think that I can know this.
> > 
> 
> The "new guy" is relative to the observer.  Hence, processor A thinks
> processor B is the new guy, and processor B thinks processor A is the
> new guy.  So who is right?  They are both right, from their observation
> points.  I'm not sure how else to think about this scenario.
> 
> The mechanism I had always believed would work would be to syncrhonize
> state whenever a new processor is available.  Since because of the
> relativity of the observer, it is impossible to know who is new, the
> algorithms have to figure out who has the correct data using some form
> of algorithm that allows all processors to agree to the data set.
> 
> It is a weakness in the SA Forum AIS that what happens on a merge and
> partition are unspecified.  Some policies (that are perhaps selectable)
> would be useful for the specifications.  For now, we can do whatever
> works.  I'd be happy enough for now, if what we had ensured that every
> processor had the same data set.
> 
> > Another thing that I found is that SaClmClusterNodeT data isn't
> > available in the confchg_fn for newly joining nodes.
> > 
> 
> This is true and the reason we added the clm_get_by_nodeid.  

This is what I tried to use.  I get NULL node information returned.

> If this is
> too cumbersome to be useful, we can change the confchg_fn passed to
> executive handlers to take the SaClmClusterNodeT data structure.  I'm
> not too attracted to this idea, because it requires adding information
> about the SaClmClusterNodeT data structure to exec/main.c to formulate
> the data set.
> 
> Regards
> -steve
> > Mark.
-- 
Mark Haverkamp <markh at osdl.org>




More information about the Openais mailing list