[Openais] Re: confchg_fn, cluster membership, etc.

Steven Dake sdake at mvista.com
Wed Sep 15 15:41:11 PDT 2004


Mark,

I have had a look at the clm_get_by_nodeid and evt.c and have come to
the conclusion that it is not possible to use this function in a
configuration change function.

This is because some of the information cannot be gathered until after
the configuration change has occured.  The nodejoin message is never
sent until after a configuration change and the nodejoin message
contains the node information (such as the timestamp of the boot, name,
etc).  I'd rather not add all of that information into the gmi form
token because we use too much space already for the membership algorithm
and it would reduce scalability in the future.

One workaround to this problem is to develop a evt configuration change
handler that doesn't use clm_get_by_nodeid.

For the join case, make evt_add_node send an executive message "add a
node" (with the ip address) and make the executive message handler match
the current body (with message decoding operations) of the evt_add_node
function.

For the removal case, change evt_remove_node to take the parameter
struct sin_addr and pass the removal ip directly to the evt_remove_node
function.  The nodeid is all that is needed for the removal case.

The above solution is what is used in clm (clmSendNodeJoin is the
function).

There may be another solution by adding a timer with a 0 timeout, but
this would have to be verified that a message cannot be delivered before
the timer expires.

Longer term, we could rearchitect the clm service to execute the
configuration change functions for all of the remaining services.  Then
it would only execute a configuration change after it had received all
of its nodejoins.  The configuration change would instead be a list of
SaClmNodeT structures.  I'll think about that, but it doesn't seem too
clean at the moment.

All of this still may not be reliable, until the RECOVERY plug is
implemented.  This disallows any HIGH, MED, LOW priority message to be
transmitted until all processors have delivered all of their RECOVERY
messages.  WIthout the recovery plug, it would be possible for a
processor to order a low priority message while another processor is
still sending recovery messages.  I think we need to do this before the
release this year.  I'll start on this soon...  

Thanks
-steve

On Fri, 2004-09-10 at 14:27, Mark Haverkamp wrote:
> On Fri, 2004-09-10 at 14:18, Steven Dake wrote:
> 
> > 
> > Ok it must be broken.  No other services are using it currently.  Is the
> > line evt.c:1986 (from bk) that is returning the incorrect values?
> 
> Yes, that is the place.
> 
> > 
> > I'll have a look at it.
> > 
> Thanks,
> Mark.




More information about the Openais mailing list