[Openais] Re: A bug for AMF

Steven Dake sdake at mvista.com
Wed Oct 20 10:17:37 PDT 2004


Sakai-san,

A few comments..  I would like to clean up the coding style of my
original code.  If your interested in this work, could you:

rename readinessStateSetClusterpriority to
readiness_state_set_group_priority

rename haStateSetCluster to ha_state_set_group

rename readinessStateSetApi to readiness_state_set_api

I think your on the right track with amf_exec_synchronize.  The place
you call it from amf_componentregister seems a little strange.  I
believe what you are doing in the conditional to select to call this
function checks if the componentregister is from an executive (vs a
library).  Is this correct?  I don't really understand this path..

I had not thought of this until I read your patch... but I think your on
the right track.  Maybe what we need is something like:

1. synchronize state of all components in system

then calculate in one step before unplugging the ring the new
ha/readiness states for all components:

2. transition standby components that should be active to active/in
service (because there are not enough active components)
3. transition active components that should be standby to standby
(because there are too many active components).
4. fail over sg's which now have a broken service group because a
component left the configuration permanently.

That should provide a correct merge operation even if multiple
processors merge in one configuration change...  The state machine will
have to change to accomidate the new possible state transitions (too
many actives to standby, standby to active after a merge)

What do you think of this new approach based upon your current work?

Thanks
-steve

On Mon, 2004-10-18 at 18:21, Miyotaka Sakai wrote:
> Steve,
> 
> This Patch is added mesage IF (req_exec_amf_componentregister) change.
> In order to scynchronize component status as quickly as possible ,
> this change is neede ,I think.
>  (Or should I add another message IF ? )
> 
> Regarding SA_AMF_NOT_RESPONDING :
> In component_unregister () ,SA_AMF_NOT_RESPONDING is needed .
> But in component_register () ,SA_AMF_NOT_RESPONDING isn't needed.
> To avoid sending response to saAmfComponenRegister ,
> this patch takes another way.
> 
> Thanks
> -Miyotaka Sakai.
> 
> 
> Steven Dake wrote:
> 
> > Sakai-san
> > My apologies but I don't see any mails recently with a patch attached
> > from you.  Could you resend it?
> > 
> > Thanks
> > -steve
> > 
> > On Mon, 2004-10-18 at 16:02, Miyotaka Sakai wrote:
> > 
> >>Steve,
> >>
> >>I would like your comment and advice on the patch
> >>that was attacted to the E-mail I sent.
> >>
> >>I don't mind whether you give me severe comment.
> >>I'll reconsider and fix that.
> >>
> >>By the way, I have other jobs on bugzilla that are 140 and 157.
> >>I'll correct these bug first ?
> >>(I'm going to do that ,after I fix this bug.)
> >>
> >>regards
> >>-Miyotaka Sakai
> >>
> >>Miyotaka Sakai wrote:
> >>
> >>>Steve,
> >>>
> >>>Could you review the patch attached to E-mail instead of last one.
> >>>Some change is added.
> >>>
> >>>And response in line.
> >>>
> >>>Miyotaka Sakai wrote:
> >>>
> >>>
> >>>>Steave,
> >>>>
> >>>>I made the patch ,and attached this Email.
> >>>>This patch include 2 bugs (142,143).
> >>>>
> >>>>response in line .
> >>>>
> >>>>Steven Dake wrote:
> >>>>
> >>>>
> >>>>>Miyotaka-san
> >>>>>
> >>>>>I'm not sure if I was clear with dsmDisabledUnlockedFailed.  This is why
> >>>>>SA_AMF_NOT_RESPONDING must be removed from your patch to work.  The
> >>>>>reason is because you are sending a message with HIGH priority in this
> >>>>>case when doing RECOVERY messages which causes the state machine to
> >>>>>break down.
> >>>>>By encoding the priority into the state machine,
> >>>>>dsmDisabledUnlockedFailed will have the priority value to cause send
> >>>>>the right hastatesetcluster or readinesssattesetcluster messages. 
> >>>>
> >>>>
> >>>>I agree with you .
> >>>>I'll avoid using RECOVERY priority in readinessSetClusetr and
> >>>>  hastateSetCluseter .
> >>>>
> >>>>Regarding SA_AMF_NOT_RESPONDING :
> >>>>In component_unregister () ,SA_AMF_NOT_RESPONDING is needed .
> >>>>In this case ,all messages must not be deliverded to library.
> >>>>But in component_register () ,SA_AMF_NOT_RESPONDING isn't needed .
> >>>>In this case ,some messages has to be deliverded to library.
> >>>>For example HealthchekCallback and OtherCallbacks.
> >>>>Of course ,program shouldn't send response to saAmfComponentUnregister.
> >>>
> >>>
> >>>I misunderstood SA_AMF_NOT_RESPONDING.
> >>>When SA_AMF_NOT_RESPONDING is set, HealethcheckcCallback is delibderd.
> >>>I'm sorry about that.
> >>>
> >>>But this case ,SA_AMF_NOT_RESPONDING isn't needed.
> >>>Becase in order to avoid sending response to saAmfComponentUnregister,
> >>>this patch take another way.
> >>>
> >>>Thanks
> >>>- Miyotaka Sakai
> >>>
> >>>
> >>>>This Patch is added mesage IF (req_exec_amf_componentregister) change.
> >>>>In order to scynchronize component date quickly ,this change is neede 
> >>>>,I think. (Should I add another message IF ? )
> >>>>
> >>>>This patch work well comparably.
> >>>>
> >>>>Thanks
> >>>>- Miyotaka
> >>>>
> >>>>
> >>>>>Thanks
> >>>>>-steve
> >>
> >>
> > 
> 
> 
> ______________________________________________________________________
> --- openais.latest/exec/amf.c	2004-10-08 00:08:39.000000000 +0900
> +++ openais/exec/amf.c	2004-10-10 01:33:17.000000000 +0900
> @@ -147,8 +147,9 @@
>  static void component_unregister (
>  	struct saAmfComponent *component);
>  
> -static void component_register (
> -	struct saAmfComponent *component);
> +static void component_registerpriority (
> +	struct saAmfComponent *component,
> +	int priority);
>  
>  static void enumerate_components (
>  	void (*function)(struct saAmfComponent *, void *data),
> @@ -233,10 +234,6 @@
>  	struct saAmfComponent *component,
>  	void *data);
>  
> -static void amf_confchg_nsync (
> -	struct saAmfComponent *component,
> -	void *data);
> -
>  static int amf_confchg_fn (
>  	enum gmi_configuration_type configuration_type,
>      struct sockaddr_in *member_list, int member_list_entries,
> @@ -478,8 +475,9 @@
>  	return (0);
>  }
>  
> -static void component_unregister (
> -	struct saAmfComponent *component)
> +static void component_unregisterpriority (
> +	struct saAmfComponent *component,
> +	int priority)
>  {
>  	struct req_exec_amf_componentunregister req_exec_amf_componentunregister;
>  	struct iovec iovecs[2];
> @@ -509,12 +507,19 @@
>  	iovecs[0].iov_base = (char *)&req_exec_amf_componentunregister;
>  	iovecs[0].iov_len = sizeof (req_exec_amf_componentunregister);
>  
> -	assert (gmi_mcast (&aisexec_groupname, iovecs, 1, GMI_PRIO_MED) == 0);
> +	assert (gmi_mcast (&aisexec_groupname, iovecs, 1, priority) == 0);
>  }
>  
> -static void component_register (
> +static void component_unregister (
>  	struct saAmfComponent *component)
>  {
> +	component_unregisterpriority (component,GMI_PRIO_MED);
> +}
> +
> +static void component_registerpriority (
> +	struct saAmfComponent *component,
> +	int priority)
> +{
>  	struct req_exec_amf_componentregister req_exec_amf_componentregister;
>  	struct iovec iovecs[2];
>  
> @@ -526,13 +531,16 @@
>  	}
>  	log_printf (LOG_LEVEL_DEBUG, "component_register: registering component %s\n",
>  		getSaNameT (&component->name));
> -	component->probableCause = SA_AMF_NOT_RESPONDING;
>  
>  	req_exec_amf_componentregister.header.size = sizeof (struct req_exec_amf_componentregister);
>  	req_exec_amf_componentregister.header.id = MESSAGE_REQ_EXEC_AMF_COMPONENTREGISTER;
>  
>  	req_exec_amf_componentregister.source.conn_info = 0;
>  	req_exec_amf_componentregister.source.in_addr.s_addr = 0;
> +	req_exec_amf_componentregister.currentReadinessState = component->currentReadinessState;
> +	req_exec_amf_componentregister.newReadinessState = component->newReadinessState;
> +	req_exec_amf_componentregister.currentHAState = component->currentHAState;
> +	req_exec_amf_componentregister.newHAState = component->newHAState;
>  
>  	memset (&req_exec_amf_componentregister.req_lib_amf_componentregister,
>  		0, sizeof (struct req_lib_amf_componentregister));
> @@ -543,7 +551,7 @@
>  	iovecs[0].iov_base = (char *)&req_exec_amf_componentregister;
>  	iovecs[0].iov_len = sizeof (req_exec_amf_componentregister);
>  
> -	assert (gmi_mcast (&aisexec_groupname, iovecs, 1, GMI_PRIO_RECOVERY) == 0);
> +	assert (gmi_mcast (&aisexec_groupname, iovecs, 1, priority) == 0);
>  }
>  
>  /***
> @@ -747,9 +755,10 @@
>  }
>  #endif
>  
> -void haStateSetCluster (
> +static void haStateSetClusterpriority (
>  	struct saAmfComponent *component,
> -	SaAmfHAStateT haState)
> +	SaAmfHAStateT haState,
> +	int priority)
>  {
>  
>  	struct req_exec_amf_hastateset req_exec_amf_hastateset;
> @@ -766,7 +775,14 @@
>  	iovecs[0].iov_base = (char *)&req_exec_amf_hastateset;
>  	iovecs[0].iov_len = sizeof (req_exec_amf_hastateset);
>  
> -	assert (gmi_mcast (&aisexec_groupname, iovecs, 1, GMI_PRIO_HIGH) == 0);
> +	assert (gmi_mcast (&aisexec_groupname, iovecs, 1, priority) == 0);
> +}
> +
> +static void haStateSetCluster (
> +	struct saAmfComponent *component,
> +	SaAmfHAStateT haState)
> +{
> +	haStateSetClusterpriority (component, haState, GMI_PRIO_HIGH);
>  }
>  
>  void readinessStateSetApi (struct saAmfComponent *component,
> @@ -837,9 +853,10 @@
>  }
>  #endif
>  
> -void readinessStateSetCluster (
> +static void readinessStateSetClusterpriority (
>  	struct saAmfComponent *component,
> -	SaAmfReadinessStateT readinessState)
> +	SaAmfReadinessStateT readinessState,
> +	int priority)
>  {
>  
>  	struct req_exec_amf_readinessstateset req_exec_amf_readinessstateset;
> @@ -857,7 +874,14 @@
>  	iovecs[0].iov_base = (char *)&req_exec_amf_readinessstateset;
>  	iovecs[0].iov_len = sizeof (req_exec_amf_readinessstateset);
>  
> -	assert (gmi_mcast (&aisexec_groupname, iovecs, 1, GMI_PRIO_HIGH) == 0);
> +	assert (gmi_mcast (&aisexec_groupname, iovecs, 1, priority) == 0);
> +}
> +
> +static void readinessStateSetCluster(
> +	struct saAmfComponent *component,
> +	SaAmfReadinessStateT readinessState)
> +{
> +	readinessStateSetClusterpriority (component, readinessState, GMI_PRIO_HIGH);
>  }
>  
>  #ifdef CMOPILE_OUT
> @@ -907,6 +931,110 @@
>  	}
>  }
>  
> +static void dsmSynchronizeStaus (
> +	struct saAmfComponent *component)
> +{
> +	struct saAmfUnit *unit;
> +	struct list_head *list;
> +	int synccompleted;
> +	enum amfOperationalAdministrativeState unit_status;
> +	enum amfEnabledUnlockedState enablestate;
> +	enum amfDisabledUnlockedState disablestate;
> +
> +	if (component->currentReadinessState == component->newReadinessState) {
> +
> +		if (component->currentReadinessState == SA_AMF_OUT_OF_SERVICE) {
> +			component->disabledUnlockedState = AMF_DISABLED_UNLOCKED_REGISTEREDORERRORCANCEL;
> +			component->enabledUnlockedState = AMF_ENABLED_UNLOCKED_INITIAL;
> +
> +		} else if (component->currentReadinessState == SA_AMF_IN_SERVICE) {
> +			component->disabledUnlockedState = AMF_DISABLED_UNLOCKED_REGISTEREDORERRORCANCEL;
> +			component->enabledUnlockedState = AMF_ENABLED_UNLOCKED_IN_SERVICE_COMPLETED;
> +
> +		} else if  (component->currentReadinessState == SA_AMF_QUIESCED) {
> +			component->disabledUnlockedState = AMF_DISABLED_UNLOCKED_QUIESCED_COMPLETED;
> +			component->enabledUnlockedState = AMF_ENABLED_UNLOCKED_INITIAL;
> +		}
> +
> +	} else {
> +		if (component->newReadinessState == SA_AMF_OUT_OF_SERVICE) {
> +			component->disabledUnlockedState = AMF_DISABLED_UNLOCKED_OUT_OF_SERVICE_REQUESTED;
> +			component->enabledUnlockedState = AMF_ENABLED_UNLOCKED_INITIAL;
> +
> +		} else if (component->newReadinessState == SA_AMF_IN_SERVICE) {
> +			component->disabledUnlockedState = AMF_DISABLED_UNLOCKED_REGISTEREDORERRORCANCEL;
> +			component->enabledUnlockedState = AMF_ENABLED_UNLOCKED_IN_SERVICE_REQUESTED;
> +		} else {
> +			component->disabledUnlockedState = AMF_DISABLED_UNLOCKED_QUIESCED_REQUESTED;
> +			component->enabledUnlockedState = AMF_ENABLED_UNLOCKED_INITIAL;
> +		}
> +	}
> +
> +	if (component->currentHAState == component->newHAState) {
> +
> +		if (component->currentHAState == SA_AMF_ACTIVE) {
> +			component->disabledUnlockedState = AMF_DISABLED_UNLOCKED_REGISTEREDORERRORCANCEL;
> +			component->enabledUnlockedState = AMF_ENABLED_UNLOCKED_ACTIVE_COMPLETED;
> +
> +		} else if (component->currentHAState == SA_AMF_STANDBY) {
> +			component->disabledUnlockedState = AMF_DISABLED_UNLOCKED_REGISTEREDORERRORCANCEL;
> +			component->enabledUnlockedState = AMF_ENABLED_UNLOCKED_STANDBY_COMPLETED;
> +
> +		} else {
> +			/* depend on readiness status */
> +		}
> +
> +	} else {
> +		if (component->newHAState == SA_AMF_ACTIVE) {
> +			component->disabledUnlockedState = AMF_DISABLED_UNLOCKED_REGISTEREDORERRORCANCEL;
> +			component->enabledUnlockedState = AMF_ENABLED_UNLOCKED_ACTIVE_REQUESTED;
> +
> +		} else if (component->newHAState == SA_AMF_STANDBY) {
> +			component->disabledUnlockedState = AMF_DISABLED_UNLOCKED_REGISTEREDORERRORCANCEL;
> +			component->enabledUnlockedState = AMF_ENABLED_UNLOCKED_STANDBY_REQUESTED;
> +
> +		} else {
> +			component->disabledUnlockedState = AMF_DISABLED_UNLOCKED_QUIESCED_REQUESTED;
> +			component->enabledUnlockedState = AMF_ENABLED_UNLOCKED_INITIAL;
> +		}
> +	}
> +
> +	if (component->enabledUnlockedState != AMF_ENABLED_UNLOCKED_INITIAL) {
> +		unit_status = AMF_ENABLED_UNLOCKED;
> +
> +	} else {
> +		unit_status = AMF_DISABLED_UNLOCKED;
> +	}
> +
> +	enablestate = component->enabledUnlockedState;
> +	disablestate = component->disabledUnlockedState;
> +
> +	unit = component->saAmfUnit;
> +	for (synccompleted = 1, list = unit->saAmfComponentHead.next;
> +		list != &unit->saAmfComponentHead;
> +		list = list->next) {
> +
> +		component = list_entry (list, struct saAmfComponent, saAmfComponentList);
> +		if (component->disabledUnlockedState != disablestate
> +			|| component->enabledUnlockedState != enablestate){
> +			synccompleted = 0;
> +			break;
> +		}
> +	}
> +
> +	if (!synccompleted) {
> +		return;
> +	}
> +
> +	/* Syncronize Operational AdministrativeState */
> +	for (list = unit->saAmfComponentHead.next; list != &unit->saAmfComponentHead; list = list->next) {
> +		component = list_entry (list, struct saAmfComponent, saAmfComponentList);
> +		component->saAmfUnit->operationalAdministrativeState = unit_status;
> +	}
> +
> +	return;
> +}
> +
>  static void dsmDisabledUnlockedFailed (
>  	struct saAmfComponent *component)
>  {
> @@ -1266,7 +1394,7 @@
>  			/* noop - operational state */
>  			break;
>  		case AMF_ENABLED_UNLOCKED_STANDBY_REQUESTED:
> -			dsmEnabledUnlockedActiveRequested (component);
> +			dsmEnabledUnlockedStandbyRequested (component);
>  			break;
>  		case AMF_ENABLED_UNLOCKED_STANDBY_COMPLETED:
>  			/* noop - operational state */
> @@ -1695,38 +1823,44 @@
>  {
>  	struct in_addr *source_addr;
>  	source_addr = (struct in_addr *)data;
> +	struct saAmfUnit *unit;
> +	struct list_head *list;
>  
>  	if (component->source_addr.s_addr != source_addr->s_addr) {
>  		return;
>  	}
> -		
> -	component->local = 1;
> -	component_unregister (component);
>  
> -	return;
> -}
> -
> -void amf_confchg_njoin (struct saAmfComponent *component ,void *data)
> -{
> -
> -	if (component->source_addr.s_addr != this_ip.sin_addr.s_addr) {
> +	if (!component->registered) {
>  		return;
>  	}
>  
> -	component_register (component);
> +        /* Component status Initialize */
> +	unit = component->saAmfUnit;
> +	for (list = unit->saAmfComponentHead.next; list != &unit->saAmfComponentHead; list = list->next) {
> +		component->registered = 0;
> +		component->local = 0;
> +		component->disabledUnlockedState = AMF_DISABLED_UNLOCKED_REGISTEREDORERRORCANCEL;
> +		component->enabledUnlockedState = AMF_ENABLED_UNLOCKED_INITIAL;
> +		component->newReadinessState = SA_AMF_OUT_OF_SERVICE;
> +		component->currentReadinessState = SA_AMF_OUT_OF_SERVICE;
> +		component->newHAState = SA_AMF_QUIESCED;
> +		component->currentHAState = SA_AMF_QUIESCED;
> +		component->source_addr.s_addr = 0;
> +	}
> +
> +	component->saAmfUnit->operationalAdministrativeState = AMF_DISABLED_UNLOCKED;
> +	dsmDisabledUnlockedOutOfServiceCompleted (component);
>  	return;
>  }
>  
> -void amf_confchg_nsync (struct saAmfComponent *component, void *data)
> +void amf_confchg_njoin (struct saAmfComponent *component ,void *data)
>  {
> +
>  	if (component->source_addr.s_addr != this_ip.sin_addr.s_addr) {
>  		return;
>  	}
>  
> -	/* dsm change must be needed */
> -	readinessStateSetCluster (component, component->currentReadinessState);
> -	haStateSetCluster (component, component->currentHAState);
> -
> +	component_registerpriority (component, GMI_PRIO_RECOVERY);
>  	return;
>  }
>  
> @@ -1747,17 +1881,11 @@
>  	 */
>  	if ( joined_list_entries > 0 ) {
>  		enumerate_components (amf_confchg_njoin, NULL);
> -		enumerate_components (amf_confchg_nsync, NULL);
>  	}
>  
>  	/*
> -	 * Note: select ONLY the first processor to execute the nleave enumeration
>  	 * If node leave, component unregister
>  	 */
> -	if (member_list->sin_addr.s_addr != this_ip.sin_addr.s_addr) {
> -		return (0);
> -	}
> -
>  	for (i = 0; i<left_list_entries ; i++) {
>  		enumerate_components (amf_confchg_nleave, (void *)&(left_list[i].sin_addr));
>  	}
> @@ -1789,6 +1917,61 @@
>  	return (0);
>  }
>  
> +static void amf_exec_synchronize (void *message, struct in_addr source_addr)
> +{
> +	struct req_exec_amf_componentregister *req_exec_amf_componentregister = (struct req_exec_amf_componentregister *)message;
> +	struct saAmfComponent *component;
> +	struct saAmfComponent *amfProxyComponent;
> +
> +	log_printf (LOG_LEVEL_DEBUG, "Executive: ComponentSynchronize for component %s\n",
> +		getSaNameT (&req_exec_amf_componentregister->req_lib_amf_componentregister.compName));
> +
> +	/* Find Component */
> +	component = findComponent (&req_exec_amf_componentregister->req_lib_amf_componentregister.compName);
> +	amfProxyComponent = findComponent (&req_exec_amf_componentregister->req_lib_amf_componentregister.proxyCompName);
> +
> +	/* If this node is Component onwer */
> +	if (component->source_addr.s_addr == this_ip.sin_addr.s_addr) {
> +
> +		/* No Operation */
> +		return;
> +	}
> +
> +	/* If this isn't Synchronizing target Node */
> +	if (!(component->local == 0 &&  component->registered == 0)){ 
> +
> +		/* No Operation */
> +		return;
> +	}
> +
> +	/* Synchronize Status */
> +	component->local = 0;
> +	component->registered = 1;
> +	component->conn_info = req_exec_amf_componentregister->source.conn_info;
> +	component->source_addr = source_addr;
> +	component->currentReadinessState = SA_AMF_OUT_OF_SERVICE;
> +	component->newReadinessState = SA_AMF_OUT_OF_SERVICE;
> +	component->currentHAState = SA_AMF_QUIESCED;
> +	component->newHAState = SA_AMF_QUIESCED;
> +	component->probableCause = 0;
> +	component->enabledUnlockedState = 0;
> +	component->disabledUnlockedState = 0;
> +	component->currentReadinessState = req_exec_amf_componentregister->currentReadinessState;
> +	component->newReadinessState = req_exec_amf_componentregister->newReadinessState;
> +	component->currentHAState = req_exec_amf_componentregister->currentHAState;
> +	component->newHAState = req_exec_amf_componentregister->newHAState;
> +
> +	if (req_exec_amf_componentregister->req_lib_amf_componentregister.proxyCompName.length > 0) {
> +		component->saAmfProxyComponent = amfProxyComponent;
> +	}
> +
> +	/*
> +	 *  Determine if we should enter new state
> +	 */
> +	dsmSynchronizeStaus (component);
> +
> +	return;
> +}
>  
>  static int message_handler_req_exec_amf_componentregister (void *message, struct in_addr source_addr)
>  {
> @@ -1809,6 +1992,15 @@
>  	amfProxyComponent = findComponent (&req_exec_amf_componentregister->req_lib_amf_componentregister.proxyCompName);
>  
>  	/*
> +	 * If a node joining menber ship ,Component States Synchronize
> +	 */
> +	if (req_exec_amf_componentregister->source.in_addr.s_addr == 0) {
> +		amf_exec_synchronize (message, source_addr);
> +
> +		return (0);
> +	}
> +
> +	/*
>  	 * If component not in configuration files, return error
>  	 */
>  	if (component == 0) {
> @@ -1877,7 +2069,6 @@
>  			sizeof (struct res_lib_amf_componentregister));
>  	}
>  
> -	
>  	/*
>  	 * If no error on registration, determine if we should enter new state
>  	 */
> @@ -2323,7 +2514,6 @@
>  	return (0);
>  }
>  
> -
>  static int message_handler_req_amf_errorreport (struct conn_info *conn_info, void *message)
>  {
>  	struct req_lib_amf_errorreport *req_lib_amf_errorreport = (struct req_lib_amf_errorreport *)message;
> --- openais.latest/include/ais_msg.h	2004-10-08 00:09:48.000000000 +0900
> +++ openais/include/ais_msg.h	2004-10-07 21:42:40.000000000 +0900
> @@ -360,6 +360,10 @@
>  	struct req_header header;
>  	struct message_source source;
>  	struct req_lib_amf_componentregister req_lib_amf_componentregister;
> +	SaAmfReadinessStateT currentReadinessState;
> +	SaAmfReadinessStateT newReadinessState;
> +	SaAmfHAStateT currentHAState;
> +	SaAmfHAStateT newHAState;
>  };
>  
>  struct res_lib_amf_componentregister {




More information about the Openais mailing list