[Openais] Re: A bug for AMF

Steven Dake sdake at mvista.com
Mon Oct 4 13:50:44 PDT 2004


Miyotaka-san

good work I think we are alot closer...  There are a few things to be
resolved..

comments in patch 

Thanks

On Mon, 2004-10-04 at 10:21, Miyotaka Sakai wrote:
> Steve,
> 
> I made a new patch for AMF .
> I would like you to review this patch .
> As you said ,I'll do seperate commit bug and debug mechanism.
> 
> And could you give me some advice abount the last my email.
> 
> thanks
> - Miyotaka Sakai.
> 
> 
> Miyotaka Sakai wrote:
> > Steve,
> > 
> > my responses inline .
> > 
> > 
> > Steven Dake wrote:
> > 
> >> Miyotaka-san
> >> comments inline
> >>
> >> On Sat, 2004-10-02 at 13:27, Miyotaka Sakai wrote:
> >>
> >>> steve,
> >>>
> >>> I've found a bug for AMF and made a patch .
> >>> I would like you to review this patch .
> >>>
> >>> Several gmi_mcast messages couldn't be delivered .
> >>> I changed gmi_mcast's message priorty from GMI_PRIO_HIGH to
> >>> GMI_PRIO_RECOVERY ,then messages could be deliverd .
> >>>
> >>> If you have another idea, Please tel me.
> >>>
> >>> This patch add several mechanisms to ease to debug .
> >>> ( a mechanism is watching AmfComponent contents )
> >>>
> >>
> >> The changing of priorities needs a different solution...  I think I
> >> messed this up for you when I added the recovery plug.  Could you fix it
> >> up?
> > 
> > I am sure. I will.
> > 
> >> The way it should work is this:
> >> messages sent during recovery (within the amf_confchg_fn function and
> >> all the other messages related to them) should be sent at recovery
> >> priority.  messages sent during normal operation should retain their
> >> current priorities.  Since the state machine must now (with the recovery
> >> plug) send messages at two different priorities depending on what state
> >> the system is in, we need to keep track of the priorities.  It is
> >> possible for the state machine to know the priority at which it should
> >> send messages by placing the priority in the first message that starts
> >> the state transitions.  Then every message that triggers a new message
> >> to be sent should encode the priority of the message to the next 
> >> state. If you need more details please ask.
> > 
> > OK.
> > I know what you mean .
> > For exsample ,haStateSetCluster relating amf_confchg_fn uses 
> > GMI_PRIO_RECOVERY ,but others use GMI_PRIO_HIGH.
> > It is possible ,and I 'll fix it .
> > 
> >> It is possible that the output queues used in gmi_mcast are full.  In
> >> this case, it is necessary to use gmi_token_callback_create.  Mark
> >> Haverkamp has implemented this in the event service if you need an
> >> example of using it.  This callback will be called the next time it may
> >> be possible to send messages.  The callback should then resend the state
> >> machine message that failed to send on an earlier try.
> > 
> > 
> > I refered to Mark's implementation and ,I understaned this 
> > implementation. Please take some time to implement this .
> > 
> >> Why the signal stuff in main.c?
> > 
> > 
> > I want to konw AmfComponents cotents at the time when I want to know .
> > ( I don't know when I want to know AmfComponents contens.
> >   It is different in each debug situation. )
> > In that case ,Most easy inmplementation is signal .
> > If you have ,Could you tell me another idea ?
> > 
> >> The added debug output looks fine.  Please try to use log_printf instead
> >> of printf though.  Could you check in just the debug output as a
> >> seperate commit after addressing the comment above?
> > 
> > 
> > I'll change printf into log_print.
> > 
> > Thanks
> > - Miyotaka Sakai.
> > 
> >>
> >> Thanks
> >> -steve
> >>
> >>
> >>> If you accept this patch ,I'll check-in .
> >>> Thanks !
> >>> - Miyotaka Sakai.
> > 
> 
> ______________________________________________________________________
> --- openais.2004-10-01.21.30/exec/amf.c	2004-10-02 13:30:03.000000000 +0900
> +++ openais/exec/amf.c	2004-10-05 01:56:48.000000000 +0900
> @@ -478,8 +478,9 @@
>  	return (0);
>  }
>  
> -static void component_unregister (
> -	struct saAmfComponent *component)
> +static void component_unregisterdtl (
> +	struct saAmfComponent *component,
> +

why unregisterdtl?  I dont understand what "dtl" means...

The patch is close but is missing something.  Priority is encoded into
the state machine transitions as well.  The reason is because the call
dsmDisabledUnlockedFailed() must call with the correct priority.  This
requires encoding the priority all along the call chain in component
register and unregister error report, etc.

Doing this has the side effect of requiring only one haStateSetCluster
and readinessStateSetCluster call instead of *tdl.

more comments below

> 	int priority)
>  {
>  	struct req_exec_amf_componentunregister req_exec_amf_componentunregister;
>  	struct iovec iovecs[2];
> @@ -509,12 +510,19 @@
>  	iovecs[0].iov_base = (char *)&req_exec_amf_componentunregister;
>  	iovecs[0].iov_len = sizeof (req_exec_amf_componentunregister);
>  
> -	assert (gmi_mcast (&aisexec_groupname, iovecs, 1, GMI_PRIO_MED) == 0);
> +	assert (gmi_mcast (&aisexec_groupname, iovecs, 1, priority) == 0);
>  }
>  
> -static void component_register (
> +static void component_unregister (
>  	struct saAmfComponent *component)
>  {
> +	component_unregisterdtl (component,GMI_PRIO_MED);
> +}
> +
> +static void component_registerdtl (
> +	struct saAmfComponent *component,
> +	int priority)
> +{
>  	struct req_exec_amf_componentregister req_exec_amf_componentregister;
>  	struct iovec iovecs[2];
>  
> @@ -526,7 +534,6 @@
>  	}
>  	log_printf (LOG_LEVEL_DEBUG, "component_register: registering component %s\n",
>  		getSaNameT (&component->name));
> -	component->probableCause = SA_AMF_NOT_RESPONDING;

^^ this change is very suspect..  I spent alot of time getting this to
work right when the application doesn't respond to a healthcheck (try
ctrl-z on the active or standby application to simulate this
condition).  I guess the reason you had to remove this was because the
if conditional in dsmDisabledUnlockedFailed (described above) failed to
transition correctly if it was set.

>  
>  	req_exec_amf_componentregister.header.size = sizeof (struct req_exec_amf_componentregister);
>  	req_exec_amf_componentregister.header.id = MESSAGE_REQ_EXEC_AMF_COMPONENTREGISTER;
> @@ -543,7 +550,7 @@
>  	iovecs[0].iov_base = (char *)&req_exec_amf_componentregister;
>  	iovecs[0].iov_len = sizeof (req_exec_amf_componentregister);
>  
> -	assert (gmi_mcast (&aisexec_groupname, iovecs, 1, GMI_PRIO_RECOVERY) == 0);
> +	assert (gmi_mcast (&aisexec_groupname, iovecs, 1, priority) == 0);
>  }
>  
>  /***
> @@ -747,9 +754,10 @@
>  }
>  #endif
>  
> -void haStateSetCluster (
> +static void haStateSetClusterdtl (
>  	struct saAmfComponent *component,
> -	SaAmfHAStateT haState)
> +	SaAmfHAStateT haState,
> +	int priority)
>  {
>  
>  	struct req_exec_amf_hastateset req_exec_amf_hastateset;
> @@ -766,7 +774,14 @@
>  	iovecs[0].iov_base = (char *)&req_exec_amf_hastateset;
>  	iovecs[0].iov_len = sizeof (req_exec_amf_hastateset);
>  
> -	assert (gmi_mcast (&aisexec_groupname, iovecs, 1, GMI_PRIO_HIGH) == 0);
> +	assert (gmi_mcast (&aisexec_groupname, iovecs, 1, priority) == 0);
> +}
> +
> +static void haStateSetCluster (
> +	struct saAmfComponent *component,
> +	SaAmfHAStateT haState)
> +{
> +	haStateSetClusterdtl (component, haState, GMI_PRIO_HIGH);
>  }
>  
>  void readinessStateSetApi (struct saAmfComponent *component,
> @@ -837,9 +852,10 @@
>  }
>  #endif
>  
> -void readinessStateSetCluster (
> +static void readinessStateSetClusterdtl(
>  	struct saAmfComponent *component,
> -	SaAmfReadinessStateT readinessState)
> +	SaAmfReadinessStateT readinessState,
> +	int priority)
>  {
>  
>  	struct req_exec_amf_readinessstateset req_exec_amf_readinessstateset;
> @@ -857,7 +873,14 @@
>  	iovecs[0].iov_base = (char *)&req_exec_amf_readinessstateset;
>  	iovecs[0].iov_len = sizeof (req_exec_amf_readinessstateset);
>  
> -	assert (gmi_mcast (&aisexec_groupname, iovecs, 1, GMI_PRIO_HIGH) == 0);
> +	assert (gmi_mcast (&aisexec_groupname, iovecs, 1, priority) == 0);
> +}
> +
> +static void readinessStateSetCluster(
> +	struct saAmfComponent *component,
> +	SaAmfReadinessStateT readinessState)
> +{
> +	readinessStateSetClusterdtl (component, readinessState, GMI_PRIO_HIGH);
>  }
>  
>  #ifdef CMOPILE_OUT
> @@ -1266,7 +1289,7 @@
>  			/* noop - operational state */
>  			break;
>  		case AMF_ENABLED_UNLOCKED_STANDBY_REQUESTED:
> -			dsmEnabledUnlockedActiveRequested (component);
> +			dsmEnabledUnlockedStandbyRequested (component);
>  			break;
>  		case AMF_ENABLED_UNLOCKED_STANDBY_COMPLETED:
>  			/* noop - operational state */
> @@ -1699,9 +1722,9 @@
>  	if (component->source_addr.s_addr != source_addr->s_addr) {
>  		return;
>  	}
> -		
> +
>  	component->local = 1;
> -	component_unregister (component);
> +	component_unregisterdtl (component, GMI_PRIO_RECOVERY);
>  
>  	return;
>  }
> @@ -1713,7 +1736,7 @@
>  		return;
>  	}
>  
> -	component_register (component);
> +	component_registerdtl (component, GMI_PRIO_RECOVERY);
>  	return;
>  }
>  
> @@ -1724,8 +1747,8 @@
>  	}
>  
>  	/* dsm change must be needed */
> -	readinessStateSetCluster (component, component->currentReadinessState);
> -	haStateSetCluster (component, component->currentHAState);
> +	readinessStateSetClusterdtl (component, component->currentReadinessState, GMI_PRIO_RECOVERY);
> +	haStateSetClusterdtl (component, component->currentHAState ,GMI_PRIO_RECOVERY);
>  
>  	return;
>  }
> @@ -2324,7 +2347,6 @@
>  	return (0);
>  }
>  
> -
>  static int message_handler_req_amf_errorreport (struct conn_info *conn_info, void *message)
>  {
>  	struct req_lib_amf_errorreport *req_lib_amf_errorreport = (struct req_lib_amf_errorreport *)message;
> @@ -2474,3 +2496,115 @@
>  	return (0);
>  }
>  

Could you make these switches instead of if's?  Or they could be a table
such as

/*
 * Make sure to sync to state machine variables (!)
 */
static char **disabled_unlocked_state_text = {
	"AMF_DISABLED_UNLOCKED_REGISTEREDORERRORCANCEL(%d)",
	"AMF_DISABLED_UNLOCKED_FAILED(%d)",

	...
};

static char *amf_disabledunlockedstate_ntoa (state) {
	static char str_disabledstate[256];

	sprintf (str_disabledstate, disabled_unlocked_state_text[state],
state);
	return (str_disabledstate);
}

> +static char *amf_disabledunlockedstate_ntoa (state)
> +{
> +	static char str_disabledstate[16];

^^^ this variable is not big enough to store the data


> +	if (state == AMF_DISABLED_UNLOCKED_REGISTEREDORERRORCANCEL) {
> +		sprintf (str_disabledstate, "AMF_DISABLED_UNLOCKED_REGISTEREDORERRORCANCEL(%d)", state);
> +	} else if (state == AMF_DISABLED_UNLOCKED_FAILED) {
> +		sprintf (str_disabledstate, "AMF_DISABLED_UNLOCKED_FAILED(%d)", state);
> +	} else if (state == AMF_DISABLED_UNLOCKED_QUIESCED_REQUESTED) {
> +		sprintf (str_disabledstate, "AMF_DISABLED_UNLOCKED_QUIESCED_REQUESTED(%d)", state);
> +	} else if (state == AMF_DISABLED_UNLOCKED_QUIESCED_COMPLETED) {
> +		sprintf (str_disabledstate, "AMF_DISABLED_UNLOCKED_QUIESCED_COMPLETED(%d)", state);
> +	} else if (state == AMF_DISABLED_UNLOCKED_OUT_OF_SERVICE_REQUESTED) {
> +		sprintf (str_disabledstate, "AMF_DISABLED_UNLOCKED_OUT_OF_SERVICE_REQUESTED(%d)", state);
> +	} else if (state == AMF_DISABLED_UNLOCKED_OUT_OF_SERVICE_COMPLETED) {
> +		sprintf (str_disabledstate, "AMF_DISABLED_UNLOCKED_OUT_OF_SERVICE_COMPLETED(%d)", state);
> +	} else {
> +		sprintf (str_disabledstate, "Unkown(%d)", state);
> +	}
> +
> +	return (str_disabledstate);
> +}
> +
> +static char *amf_enabledunlockedstate_ntoa (state)
> +{
> +	static char str_enabledunlockedstate[16];
> +	if (state == AMF_ENABLED_UNLOCKED_INITIAL) {
> +		sprintf (str_enabledunlockedstate, "AMF_ENABLED_UNLOCKED_INITIAL(%d)", state);
> +	} else if (state == AMF_ENABLED_UNLOCKED_IN_SERVICE_REQUESTED) {
> +		sprintf (str_enabledunlockedstate, "AMF_ENABLED_UNLOCKED_IN_SERVICE_REQUESTED(%d)", state);
> +	} else if (state == AMF_ENABLED_UNLOCKED_IN_SERVICE_COMPLETED) {
> +		sprintf (str_enabledunlockedstate, "AMF_ENABLED_UNLOCKED_IN_SERVICE_COMPLETED(%d)", state);
> +	} else if (state == AMF_ENABLED_UNLOCKED_ACTIVE_REQUESTED) {
> +		sprintf (str_enabledunlockedstate, "AMF_ENABLED_UNLOCKED_ACTIVE_REQUESTED(%d)", state);
> +	} else if (state == AMF_ENABLED_UNLOCKED_ACTIVE_COMPLETED) {
> +		sprintf (str_enabledunlockedstate, "AMF_ENABLED_UNLOCKED_ACTIVE_COMPLETED(%d)", state);
> +	} else if (state == AMF_ENABLED_UNLOCKED_STANDBY_REQUESTED) {
> +		sprintf (str_enabledunlockedstate, "AMF_ENABLED_UNLOCKED_STANDBY_REQUESTED(%d)", state);
> +	} else if (state == AMF_ENABLED_UNLOCKED_STANDBY_COMPLETED) {
> +		sprintf (str_enabledunlockedstate, "AMF_ENABLED_UNLOCKED_STANDBY_COMPLETED(%d)", state);
> +	} else {
> +		sprintf (str_enabledunlockedstate, "Unkown(%d)", state);
> +	}
> +
> +	return (str_enabledunlockedstate);
> +}
> +
> +static char *amf_readinessstate_ntoa (readinessstate)
> +{
> +	static char str_readiness[16];
> +	if (readinessstate == SA_AMF_OUT_OF_SERVICE) {
> +		sprintf (str_readiness, "SA_AMF_OUT_OF_SERVICE(%d)", readinessstate);
> +	} else if (readinessstate == SA_AMF_IN_SERVICE) {
> +		sprintf (str_readiness, "SA_AMF_IN_SERVICE(%d)", readinessstate);
> +	} else if (readinessstate == SA_AMF_STOPPING) {
> +		sprintf (str_readiness, "SA_AMF_STOPPING(%d)", readinessstate);
> +	} else {
> +		sprintf (str_readiness, "Unkown (%d)", readinessstate);
> +	}
> +
> +	return (str_readiness);
> +}
> +
> +static char *amf_hastate_ntoa (hastate)
> +{
> +	static char str_hastate[16];
> +	if (hastate == SA_AMF_ACTIVE) {
> +		sprintf (str_hastate, "SA_AMF_ACTIVE(%d)", hastate);
> +	} else if (hastate == SA_AMF_STANDBY) {
> +		sprintf (str_hastate, "SA_AMF_STANDBY(%d)", hastate);
> +	} else if (hastate == SA_AMF_QUIESCED) {
> +		sprintf (str_hastate, "SA_AMF_QUIESCED(%d)", hastate);
> +	} else {
> +		sprintf (str_hastate, "Unkown (%d)", hastate);
> +	}
> +
> +	return (str_hastate);
> +}
> +
^^ same comments apply from above


> +static void amf_dump_comp (struct saAmfComponent *component ,void *data)
> +{
> +	char name[64];
> +	data = NULL;
> +
> +	log_printf (LOG_LEVEL_DEBUG, "----------------\n" );
> +	log_printf (LOG_LEVEL_DEBUG, "registered            = %d\n" ,component->registered);
> +	log_printf (LOG_LEVEL_DEBUG, "local                 = %d\n" ,component->local );
> +	log_printf (LOG_LEVEL_DEBUG, "source_addr           = %s\n" ,inet_ntoa (component->source_addr));
> +	memset (name, 0 , sizeof(name));
> +	memcpy (name, component->name.value, component->name.length);
> +	log_printf (LOG_LEVEL_DEBUG, "name                  = %s\n" ,name );
> +	log_printf (LOG_LEVEL_DEBUG, "currentReadinessState = %s\n" ,amf_readinessstate_ntoa (component->currentReadinessState));
> +	log_printf (LOG_LEVEL_DEBUG, "newReadinessState     = %s\n" ,amf_readinessstate_ntoa (component->newReadinessState));
> +	log_printf (LOG_LEVEL_DEBUG, "currentHAState        = %s\n" ,amf_hastate_ntoa (component->currentHAState));
> +	log_printf (LOG_LEVEL_DEBUG, "newHAState            = %s\n" ,amf_hastate_ntoa (component->newHAState));
> +	log_printf (LOG_LEVEL_DEBUG, "enabledUnlockedState  = %s\n" ,amf_enabledunlockedstate_ntoa (component->enabledUnlockedState));
> +	log_printf (LOG_LEVEL_DEBUG,"disabledUnlockedState = %s\n" ,amf_disabledunlockedstate_ntoa (component->disabledUnlockedState));
> +	log_printf (LOG_LEVEL_DEBUG,"probableCause         = %d\n" ,component->probableCause );
> +}
> +

^^^ this debug output is really good work thanks for the effort here.


> +void amf_dump ( )
> +{
> +	pid_t pid;
> +
> +	pid = fork ();

why the fork?  I'd rather not have this unless it is absolutely
necessary..  It could create some unforseen problems or change behavior
during debugging.

> +	if (pid == 0) {
> +		enumerate_components (amf_dump_comp, NULL);
> +		fflush (stdout);
> +		exit (0);
> +	}
> +
> +	return;
> +}
> --- openais.2004-10-01.21.30/exec/amf.h	2004-10-02 13:30:03.000000000 +0900
> +++ openais/exec/amf.h	2004-10-03 04:00:13.000000000 +0900
> @@ -52,6 +52,7 @@
>  	int trackActive;
>  };
>  
> +void amf_dump ();
>  extern struct service_handler amf_service_handler;
>  
>  #endif /* AMF_H_DEFINED */
> --- openais.2004-10-01.21.30/exec/main.c	2004-10-02 13:30:03.000000000 +0900
> +++ openais/exec/main.c	2004-10-05 01:47:36.000000000 +0900
> @@ -143,6 +143,21 @@
>  	return (conn_info);
>  }
>  
> +#ifdef DEBUG
> +static void aisexec_sigusr2 ( )
> +{
> +	amf_dump ();
> +
> +	signal (SIGUSR2 ,aisexec_sigusr2);
> +	return;
> +}
> +
> +static void aisexec_signal ( )
> +{
> +	signal (SIGUSR2 ,aisexec_sigusr2);
> +	return;
> +}
> +#endif
>  

this signal is confusing to me..  You call aisexec_signal which sets
signal to aisexec_sigusr2?  What is the purpose of this code?

the prototype is not right for a signal handler it should be (example
from main.c) :

void sigintr_handler (int signum)


>  struct sockaddr_in this_ip;
>  #define LOCALHOST_IP inet_addr("127.0.0.1")
> @@ -873,6 +888,10 @@
>  
>  	aisexec_poll_handle = poll_create ();
>  
> +#ifdef DEBUG
> +	aisexec_signal ( );
> +#endif
> +
just call:
	signal (SIGUSR2, sigusr2_handler);


>  	/*
>  	 * if gmi_init doesn't have root priveleges, it cannot
>  	 * bind to a specific interface.  This only matters if




More information about the Openais mailing list