[Openais] Two questions; Intro to AIS? and TOTEM errors in fence loop

Darren Thompson darrent at akurit.com.au
Mon Nov 2 14:16:07 PST 2009


Kelly et al

My experience is from SLES (SUSE Enterprise) so you may need to
"localise" this for your distribution.

First of all, get rid of that crappy "network bridge fudge script" that
XEN uses to monkey around with your comms when XEND starts. Find the
file Xendconfig.sxp (or the equivalent xend config file).
Find the network section where it explains the various bridging methods
and look for the line "network-bridge". REM this line out completely.

Now using the appropriate network tools create a network bridge for each
real NIC that you have (in this case br0, br1, br2).
Within the bridge script, connect each of your real Ethernet ports to
the bridge, remove any IP configuration from the Ethernet port, add the
network configuration to the bridge. Do the same for both nodes.
Note; you can also bond the NICs first, then connect the bond to the
bridge and this also works, now with NIC redundancy,

This way the bridging is intrinsically set up from network
initialisation and is not change part of the way through the boot
process (the ugly Xen network script caused me all sorts of hassles with
clusters which is why I now do it this way).

I hope this helps.

Regards
Darren




On Mon, 2009-11-02 at 16:55 -0500, Madison Kelly wrote:
> Hi all,
> 
>    I've been playing with clustering on CentOS 5.x lately and thus far 
> I've not really understood AIS' role in it. I've been to the AIS website 
> but there isn't much there.
> 
>    So my first question is; Where can I go to learn the fundamentals of AIS?
> 
>    Second question relates to a real-world problem I've been having. 
> I've got a pretty 2-node simple cluster running DRBD+LVM on eth1. I use 
> eth0 between the two nodes as a back channel which connects to an 
> internal network via which we can manage the servers with IPMI. Lastly, 
> there is eth2 on each node that is Internet-facing.
> 
>    So far, I can't put eth0 under Xen control (that is, I can't have it 
> virtualized on dom0) without it causing a fence loop. I've tried asking 
> for help elsewhere but some far I've not heard back.
> 
>    The reason I am asking here now is that the node that gets fenced 
> shows this in it's logs several times just before going down:
> 
> Oct 31 00:27:21 vsh02 openais[3133]: [TOTEM] FAILED TO RECEIVE
> Oct 31 00:27:21 vsh02 openais[3133]: [TOTEM] entering GATHER state from 6.
> 
>    On the surviving node I see this:
> 
> Oct 31 00:35:47 vsh03 openais[3237]: [TOTEM] The token was lost in the 
> OPERATIONAL state.
> Oct 31 00:35:47 vsh03 openais[3237]: [TOTEM] Receive multicast socket 
> recv buffer size (288000 bytes).
> Oct 31 00:35:47 vsh03 openais[3237]: [TOTEM] Transmit multicast socket 
> send buffer size (262142 bytes).
> Oct 31 00:35:47 vsh03 openais[3237]: [TOTEM] entering GATHER state from 2.
> Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] entering GATHER state from 0.
> Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] Creating commit token 
> because I am the rep.
> Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] Saving state aru 2c high 
> seq received 2c
> Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] Storing new sequence id for 
> ring 108
> Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] entering COMMIT state.
> Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] entering RECOVERY state.
> Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] position [0] member 
> 10.255.135.3:
> Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] previous ring seq 260 rep 
> 10.255.135.2
> Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] aru 2c high delivered 2c 
> received flag 1
> Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] Did not need to originate 
> any messages in recovery.
> Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] Sending initial ORF token
> Oct 31 00:35:51 vsh03 openais[3237]: [CLM  ] CLM CONFIGURATION CHANGE
> Oct 31 00:35:51 vsh03 openais[3237]: [CLM  ] New Configuration:
> Oct 31 00:35:51 vsh03 kernel: dlm: closing connection to node 1
> Oct 31 00:35:51 vsh03 fenced[3256]: vsh02.domain.com not a cluster 
> member after 0 sec post_fail_delay
> Oct 31 00:35:51 vsh03 openais[3237]: [CLM  ]     r(0) ip(10.255.135.3)
> Oct 31 00:35:51 vsh03 fenced[3256]: fencing node "vsh02.domain.com"
> 
>    This happens when I put eth0 under Xen's management. The nodes will 
> keep fencing until DRBD breaks. At that point, the fencing stops and 
> everything seems to be fine. However, once I fix the DRBD partition and 
> try to look at the LVM the above errors return and I'm right back into a 
> fence loop until DRBD breaks again.
> 
>    Even a pointer to where I can learn more about AIS/TOTEM so that I 
> can try to understand what's going on would be awesome. I'm really stuck 
> on this error...
> 
> Thanks!
> 
> Madi
> _______________________________________________
> Openais mailing list
> Openais at lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/openais
> 



More information about the Openais mailing list