[Openais] Failover problem

Haussecker, Armin armin.haussecker at ts.fujitsu.com
Fri Apr 16 06:28:31 PDT 2010


Hi,

we have a 2-node-cluster based on SLES11 , openais (0.80.3-26.8.1) and pacemaker (1.0.5-0.5.6). Sometimes the failover from one node (named cuzzonib) to the second node (named cuzzonia) fails with the following messages:

Apr 16 13:16:14 cuzzonib lrmd: [6706]: info: Try to stop STONITH resource <rsc_id=iRMC_cuzzoniaInstance:0> : Device=external/ipmi
Apr 16 13:16:14 cuzzonib crmd: [18479]: info: process_lrm_event: LRM operation iRMC_cuzzoniaInstance:0_stop_0 (call=51, rc=0, cib-update=108, confirmed=true) ok
Apr 16 13:16:14 cuzzonib crmd: [18479]: info: match_graph_event: Action iRMC_cuzzoniaInstance:0_stop_0 (25) confirmed on cuzzonib (rc=0)

Apr 16 13:16:14 cuzzonib crmd: [18479]: info: te_pseudo_action: Pseudo action 29 fired and confirmed
Apr 16 13:16:14 cuzzonib crmd: [18479]: info: te_crm_command: Executing crm-event (79): do_shutdown on cuzzonib
Apr 16 13:16:14 cuzzonib crmd: [18479]: info: te_crm_command: crm-event (79) is a local shutdown

Apr 16 13:16:17 cuzzonib logger: /etc/xen/scripts/xen-hotplug-cleanup: XENBUS_PATH=backend/vkbd/4/0
Apr 16 13:16:17 cuzzonib logger: /etc/xen/scripts/xen-hotplug-cleanup: XENBUS_PATH=backend/console/4/0
Apr 16 13:16:17 cuzzonib logger: /etc/xen/scripts/xen-hotplug-cleanup: XENBUS_PATH=backend/vfb/4/0
Apr 16 13:16:17 cuzzonib logger: /etc/xen/scripts/xen-hotplug-cleanup: XENBUS_PATH=backend/vif/4/0
Apr 16 13:16:17 cuzzonib logger: /etc/xen/scripts/block: remove XENBUS_PATH=backend/vbd/4/51712
Apr 16 13:16:17 cuzzonib logger: /etc/xen/scripts/block: remove XENBUS_PATH=backend/vbd/4/51744
Apr 16 13:16:17 cuzzonib logger: /etc/xen/scripts/xen-hotplug-cleanup: XENBUS_PATH=backend/vbd/4/51712
Apr 16 13:16:17 cuzzonib logger: /etc/xen/scripts/xen-hotplug-cleanup: XENBUS_PATH=backend/vbd/4/51744

Apr 16 13:16:32 cuzzonib openais[18468]: [crm  ] notice: pcmk_shutdown: Still waiting for crmd (pid=18479, seq=6) to terminate..
.
Apr 16 13:16:38 cuzzonib openais[18468]: [TOTEM] The token was lost in the OPERATIONAL state.
Apr 16 13:16:38 cuzzonib openais[18468]: [TOTEM] Receive multicast socket recv buffer size (262142 bytes).
Apr 16 13:16:38 cuzzonib openais[18468]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes).
Apr 16 13:16:38 cuzzonib openais[18468]: [TOTEM] entering GATHER state from 2.
Apr 16 13:16:58 cuzzonib openais[18468]: [TOTEM] entering GATHER state from 0.
Apr 16 13:16:58 cuzzonib openais[18468]: [TOTEM] Creating commit token because I am the rep.
Apr 16 13:16:58 cuzzonib openais[18468]: [TOTEM] Saving state aru 14b high seq received 14b
Apr 16 13:16:58 cuzzonib openais[18468]: [TOTEM] Storing new sequence id for ring bb4
Apr 16 13:16:58 cuzzonib openais[18468]: [TOTEM] entering COMMIT state.
Apr 16 13:16:58 cuzzonib openais[18468]: [TOTEM] entering RECOVERY state.
Apr 16 13:16:58 cuzzonib openais[18468]: [TOTEM] position [0] member 192.168.10.5:
Apr 16 13:16:58 cuzzonib openais[18468]: [TOTEM] previous ring seq 2992 rep 192.168.10.3
Apr 16 13:16:58 cuzzonib openais[18468]: [TOTEM] aru 14b high delivered 14b received flag 1
Apr 16 13:16:58 cuzzonib openais[18468]: [TOTEM] Did not need to originate any messages in recovery.
Apr 16 13:16:58 cuzzonib openais[18468]: [TOTEM] Sending initial ORF token
Apr 16 13:16:58 cuzzonib openais[18468]: [CLM  ] CLM CONFIGURATION CHANGE
Apr 16 13:16:58 cuzzonib openais[18468]: [CLM  ] New Configuration:
Apr 16 13:16:58 cuzzonib openais[18468]: [CLM  ]        r(0) ip(192.168.10.5)

Apr 16 13:16:58 cuzzonib openais[18468]: [CLM  ] Members Left:
Apr 16 13:16:58 cuzzonib crmd: [18479]: notice: ais_dispatch: Membership 2996: quorum lost
Apr 16 13:16:58 cuzzonib cib: [18475]: notice: ais_dispatch: Membership 2996: quorum lost
Apr 16 13:16:58 cuzzonib crmd: [18479]: info: ais_status_callback: status: cuzzonia is now lost (was member)

Apr 16 13:16:58 cuzzonib cib: [18475]: info: crm_update_peer: Node cuzzonia: id=51030208 state=lost (new) addr=r(0) ip(192.168.10.3)  votes=1 born=2992 seen=2992 proc=00000000000000000000000000053312

Afterwards the second cluster node (cuzzonia) is rebooted.
What could be the reason for the problem ?

Regards,
Armin Haussecker







-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.linux-foundation.org/pipermail/openais/attachments/20100416/1694be91/attachment.htm 


More information about the Openais mailing list