[Openais] Re: confchg_fn, cluster membership, etc.

Steven Dake sdake at mvista.com
Wed Sep 15 16:05:28 PDT 2004


On Wed, 2004-09-15 at 15:53, Mark Haverkamp wrote:
> On Wed, 2004-09-15 at 15:41, Steven Dake wrote:
> > Mark,
> > 
> > I have had a look at the clm_get_by_nodeid and evt.c and have come to
> > the conclusion that it is not possible to use this function in a
> > configuration change function.
> > 
> > This is because some of the information cannot be gathered until after
> > the configuration change has occured.  The nodejoin message is never
> > sent until after a configuration change and the nodejoin message
> > contains the node information (such as the timestamp of the boot, name,
> > etc).  I'd rather not add all of that information into the gmi form
> > token because we use too much space already for the membership algorithm
> > and it would reduce scalability in the future.
> > 
> > One workaround to this problem is to develop a evt configuration change
> > handler that doesn't use clm_get_by_nodeid.
> > 
> > For the join case, make evt_add_node send an executive message "add a
> > node" (with the ip address) and make the executive message handler match
> > the current body (with message decoding operations) of the evt_add_node
> > function.
> 
> I was kind of working in that direction.  Although, if you look at my
> EVT_CONF_CHANGE case in the patch for bug 43, I had to check for the
> cluster node pointer being NULL because I don't always get a cluster
> node, even there.
> 

Yes you will need the EVS plug in order for this scheme to work
correctly 100% of the time.

Regards,
-steve

> 
> > 
> > For the removal case, change evt_remove_node to take the parameter
> > struct sin_addr and pass the removal ip directly to the evt_remove_node
> > function.  The nodeid is all that is needed for the removal case.
> > 
> > The above solution is what is used in clm (clmSendNodeJoin is the
> > function).
> > 
> > There may be another solution by adding a timer with a 0 timeout, but
> > this would have to be verified that a message cannot be delivered before
> > the timer expires.
> > 
> > Longer term, we could rearchitect the clm service to execute the
> > configuration change functions for all of the remaining services.  Then
> > it would only execute a configuration change after it had received all
> > of its nodejoins.  The configuration change would instead be a list of
> > SaClmNodeT structures.  I'll think about that, but it doesn't seem too
> > clean at the moment.
> 
> I think that it would be nice to not have my config call back called
> until the new membership was fully settled and all the information was
> available.
> 
> > 
> > All of this still may not be reliable, until the RECOVERY plug is
> > implemented.  This disallows any HIGH, MED, LOW priority message to be
> > transmitted until all processors have delivered all of their RECOVERY
> > messages.  WIthout the recovery plug, it would be possible for a
> > processor to order a low priority message while another processor is
> > still sending recovery messages.  I think we need to do this before the
> > release this year.  I'll start on this soon...  
> > 
> > Thanks
> > -steve
> > 
> > On Fri, 2004-09-10 at 14:27, Mark Haverkamp wrote:
> > > On Fri, 2004-09-10 at 14:18, Steven Dake wrote:
> > > 
> > > > 
> > > > Ok it must be broken.  No other services are using it currently.  Is the
> > > > line evt.c:1986 (from bk) that is returning the incorrect values?
> > > 
> > > Yes, that is the place.
> > > 
> > > > 
> > > > I'll have a look at it.
> > > > 
> > > Thanks,
> > > Mark.




More information about the Openais mailing list