[Openais] Node loss detection taking a long time

Steven Dake sdake at redhat.com
Wed Feb 4 08:31:55 PST 2009


On Wed, 2009-02-04 at 17:14 +0100, Andrew Beekhof wrote:
> On Feb 4, 2009, at 5:08 PM, Steven Dake wrote:
> 
> > 10 seconds. (10000 msec).
> 
> That was my impression (based on the 'token' setting right?) too.
> Have you any thoughts on what could have caused it to take 3 times that?
> 
> What informadtion would you need to comment further?
> 

No clue why it would take longer or how to debug it.  Might run
wireshark with the patch recently posted to the list to see what the
protocol is actually doing.

Regards
-steve

> 
> >
> >
> > Regards
> > -steve
> >
> > On Wed, 2009-02-04 at 16:21 +0100, Andrew Beekhof wrote:
> >> Given the following totem section in openais.conf, how long would you
> >> expect whitetank to notice the node was down?
> >>
> >> totem {
> >> 	token:          10000
> >> 	token_retransmits_before_loss_const: 20
> >> 	join:           60
> >> 	consensus:      4800
> >> 	vsftype:        none
> >> 	max_messages:   20
> >>
> >> 	nodeid: 16
> >> 	threads: 0
> >> 	secauth: on
> >> 	version: 2
> >> 	interface {
> >> 		ringnumber: 0
> >> 		bindnetaddr: 192.168.1.0
> >> 		mcastport: 5405
> >> 		mcastaddr: 226.94.1.1
> >> 	}
> >> 	rrp_mode: passive
> >> 	interface {
> >> 		ringnumber: 1
> >> 		bindnetaddr: 10.10.0.0
> >> 		mcastport: 5406
> >> 		mcastaddr: 226.94.1.10
> >> 	}
> >> }
> >>
> >> It seems to have taken 30s or so (the times on vm14 and 16 are within
> >> 3s of each other).
> >>
> >> Feb  3 22:52:08 s390vm14 crmd: [28359]: debug: ...
> >> Feb  3 22:52:35 s390vm16 openais[17354]: [TOTEM] The token was lost  
> >> in
> >> the OPERATIONAL state.
> >>
> >> And because its a VM, it was up again before openais calculated a new
> >> membership
> >>
> >> Feb  3 22:52:39 s390vm16 openais[17354]: [TOTEM] Did not need to
> >> originate any messages in recovery.
> >> Feb  3 22:52:39 s390vm16 openais[17354]: [CLM  ] CLM CONFIGURATION
> >> CHANGE
> >> Feb  3 22:52:39 s390vm16 openais[17354]: [CLM  ] New Configuration:
> >> Feb  3 22:52:39 s390vm16 openais[17354]: [CLM  ] 	r(0)
> >> ip(192.168.1.13) r(1) ip(10.10.220.109)
> >> Feb  3 22:52:39 s390vm16 openais[17354]: [CLM  ] 	r(0)
> >> ip(192.168.1.14) r(1) ip(10.10.220.110)
> >> Feb  3 22:52:39 s390vm16 openais[17354]: [CLM  ] 	r(0)
> >> ip(192.168.1.16) r(1) ip(10.10.220.112)
> >> Feb  3 22:52:39 s390vm16 openais[17354]: [CLM  ] Members Left:
> >> Feb  3 22:52:39 s390vm16 openais[17354]: [CLM  ] Members Joined:
> >> Feb  3 22:52:39 s390vm16 openais[17354]: [crm  ] notice:
> >> global_confchg_fn: Stable membership event on ring 4320: memb=3,
> >> new=0, lost=0
> >> Feb  3 22:52:39 s390vm16 openais[17354]: [crm  ] info:
> >> global_confchg_fn: MEMB: s390vm13 13
> >> Feb  3 22:52:39 s390vm16 openais[17354]: [crm  ] info:
> >> global_confchg_fn: MEMB: s390vm14 14
> >> Feb  3 22:52:39 s390vm16 openais[17354]: [crm  ] info:
> >> global_confchg_fn: MEMB: s390vm16 16
> >>
> >> Which confuses the rest of the cluster a fraction because its as if
> >> the cluster never left.
> >> _______________________________________________
> >> Openais mailing list
> >> Openais at lists.linux-foundation.org
> >> https://lists.linux-foundation.org/mailman/listinfo/openais
> >
> 



More information about the Openais mailing list