[Openais] Node loss detection taking a long time

Andrew Beekhof abeekhof at suse.de
Wed Feb 4 08:14:26 PST 2009


On Feb 4, 2009, at 5:08 PM, Steven Dake wrote:

> 10 seconds. (10000 msec).

That was my impression (based on the 'token' setting right?) too.
Have you any thoughts on what could have caused it to take 3 times that?

What information would you need to comment further?


>
>
> Regards
> -steve
>
> On Wed, 2009-02-04 at 16:21 +0100, Andrew Beekhof wrote:
>> Given the following totem section in openais.conf, how long would you
>> expect whitetank to notice the node was down?
>>
>> totem {
>> 	token:          10000
>> 	token_retransmits_before_loss_const: 20
>> 	join:           60
>> 	consensus:      4800
>> 	vsftype:        none
>> 	max_messages:   20
>>
>> 	nodeid: 16
>> 	threads: 0
>> 	secauth: on
>> 	version: 2
>> 	interface {
>> 		ringnumber: 0
>> 		bindnetaddr: 192.168.1.0
>> 		mcastport: 5405
>> 		mcastaddr: 226.94.1.1
>> 	}
>> 	rrp_mode: passive
>> 	interface {
>> 		ringnumber: 1
>> 		bindnetaddr: 10.10.0.0
>> 		mcastport: 5406
>> 		mcastaddr: 226.94.1.10
>> 	}
>> }
>>
>> It seems to have taken 30s or so (the times on vm14 and 16 are within
>> 3s of each other).
>>
>> Feb  3 22:52:08 s390vm14 crmd: [28359]: debug: ...
>> Feb  3 22:52:35 s390vm16 openais[17354]: [TOTEM] The token was lost  
>> in
>> the OPERATIONAL state.
>>
>> And because its a VM, it was up again before openais calculated a new
>> membership
>>
>> Feb  3 22:52:39 s390vm16 openais[17354]: [TOTEM] Did not need to
>> originate any messages in recovery.
>> Feb  3 22:52:39 s390vm16 openais[17354]: [CLM  ] CLM CONFIGURATION
>> CHANGE
>> Feb  3 22:52:39 s390vm16 openais[17354]: [CLM  ] New Configuration:
>> Feb  3 22:52:39 s390vm16 openais[17354]: [CLM  ] 	r(0)
>> ip(192.168.1.13) r(1) ip(10.10.220.109)
>> Feb  3 22:52:39 s390vm16 openais[17354]: [CLM  ] 	r(0)
>> ip(192.168.1.14) r(1) ip(10.10.220.110)
>> Feb  3 22:52:39 s390vm16 openais[17354]: [CLM  ] 	r(0)
>> ip(192.168.1.16) r(1) ip(10.10.220.112)
>> Feb  3 22:52:39 s390vm16 openais[17354]: [CLM  ] Members Left:
>> Feb  3 22:52:39 s390vm16 openais[17354]: [CLM  ] Members Joined:
>> Feb  3 22:52:39 s390vm16 openais[17354]: [crm  ] notice:
>> global_confchg_fn: Stable membership event on ring 4320: memb=3,
>> new=0, lost=0
>> Feb  3 22:52:39 s390vm16 openais[17354]: [crm  ] info:
>> global_confchg_fn: MEMB: s390vm13 13
>> Feb  3 22:52:39 s390vm16 openais[17354]: [crm  ] info:
>> global_confchg_fn: MEMB: s390vm14 14
>> Feb  3 22:52:39 s390vm16 openais[17354]: [crm  ] info:
>> global_confchg_fn: MEMB: s390vm16 16
>>
>> Which confuses the rest of the cluster a fraction because its as if
>> the cluster never left.
>> _______________________________________________
>> Openais mailing list
>> Openais at lists.linux-foundation.org
>> https://lists.linux-foundation.org/mailman/listinfo/openais
>



More information about the Openais mailing list