[Openais] Node loss detection taking a long time

Andrew Beekhof abeekhof at suse.de
Wed Feb 4 07:21:36 PST 2009


Given the following totem section in openais.conf, how long would you  
expect whitetank to notice the node was down?

totem {
	token:          10000
	token_retransmits_before_loss_const: 20
	join:           60
	consensus:      4800
	vsftype:        none
	max_messages:   20

	nodeid: 16
	threads: 0
	secauth: on
	version: 2
	interface {
		ringnumber: 0
		bindnetaddr: 192.168.1.0
		mcastport: 5405
		mcastaddr: 226.94.1.1
	}
	rrp_mode: passive
	interface {
		ringnumber: 1
		bindnetaddr: 10.10.0.0
		mcastport: 5406
		mcastaddr: 226.94.1.10
	}
}

It seems to have taken 30s or so (the times on vm14 and 16 are within  
3s of each other).

Feb  3 22:52:08 s390vm14 crmd: [28359]: debug: ...
Feb  3 22:52:35 s390vm16 openais[17354]: [TOTEM] The token was lost in  
the OPERATIONAL state.

And because its a VM, it was up again before openais calculated a new  
membership

Feb  3 22:52:39 s390vm16 openais[17354]: [TOTEM] Did not need to  
originate any messages in recovery.
Feb  3 22:52:39 s390vm16 openais[17354]: [CLM  ] CLM CONFIGURATION  
CHANGE
Feb  3 22:52:39 s390vm16 openais[17354]: [CLM  ] New Configuration:
Feb  3 22:52:39 s390vm16 openais[17354]: [CLM  ] 	r(0)  
ip(192.168.1.13) r(1) ip(10.10.220.109)
Feb  3 22:52:39 s390vm16 openais[17354]: [CLM  ] 	r(0)  
ip(192.168.1.14) r(1) ip(10.10.220.110)
Feb  3 22:52:39 s390vm16 openais[17354]: [CLM  ] 	r(0)  
ip(192.168.1.16) r(1) ip(10.10.220.112)
Feb  3 22:52:39 s390vm16 openais[17354]: [CLM  ] Members Left:
Feb  3 22:52:39 s390vm16 openais[17354]: [CLM  ] Members Joined:
Feb  3 22:52:39 s390vm16 openais[17354]: [crm  ] notice:  
global_confchg_fn: Stable membership event on ring 4320: memb=3,  
new=0, lost=0
Feb  3 22:52:39 s390vm16 openais[17354]: [crm  ] info:  
global_confchg_fn: MEMB: s390vm13 13
Feb  3 22:52:39 s390vm16 openais[17354]: [crm  ] info:  
global_confchg_fn: MEMB: s390vm14 14
Feb  3 22:52:39 s390vm16 openais[17354]: [crm  ] info:  
global_confchg_fn: MEMB: s390vm16 16

Which confuses the rest of the cluster a fraction because its as if  
the cluster never left.


More information about the Openais mailing list