[Openais] Two questions; Intro to AIS? and TOTEM errors in fence loop
Steven Dake
sdake at redhat.com
Mon Nov 2 15:37:36 PST 2009
On Tue, 2009-11-03 at 08:46 +1030, Darren Thompson wrote:
> Kelly et al
>
> My experience is from SLES (SUSE Enterprise) so you may need to
> "localise" this for your distribution.
>
> First of all, get rid of that crappy "network bridge fudge script" that
> XEN uses to monkey around with your comms when XEND starts. Find the
> file Xendconfig.sxp (or the equivalent xend config file).
> Find the network section where it explains the various bridging methods
> and look for the line "network-bridge". REM this line out completely.
>
> Now using the appropriate network tools create a network bridge for each
> real NIC that you have (in this case br0, br1, br2).
> Within the bridge script, connect each of your real Ethernet ports to
> the bridge, remove any IP configuration from the Ethernet port, add the
> network configuration to the bridge. Do the same for both nodes.
> Note; you can also bond the NICs first, then connect the bond to the
> bridge and this also works, now with NIC redundancy,
>
> This way the bridging is intrinsically set up from network
> initialisation and is not change part of the way through the boot
> process (the ugly Xen network script caused me all sorts of hassles with
> clusters which is why I now do it this way).
>
> I hope this helps.
>
> Regards
> Darren
>
>
Darren
Thanks for the detailed explination.
There are numerous issues with xend and openais operating together
because xen does wierd networking stuff after openais is started.
I hope the above helps.
Regards
-steve
>
>
> On Mon, 2009-11-02 at 16:55 -0500, Madison Kelly wrote:
> > Hi all,
> >
> > I've been playing with clustering on CentOS 5.x lately and thus far
> > I've not really understood AIS' role in it. I've been to the AIS website
> > but there isn't much there.
> >
> > So my first question is; Where can I go to learn the fundamentals of AIS?
> >
> > Second question relates to a real-world problem I've been having.
> > I've got a pretty 2-node simple cluster running DRBD+LVM on eth1. I use
> > eth0 between the two nodes as a back channel which connects to an
> > internal network via which we can manage the servers with IPMI. Lastly,
> > there is eth2 on each node that is Internet-facing.
> >
> > So far, I can't put eth0 under Xen control (that is, I can't have it
> > virtualized on dom0) without it causing a fence loop. I've tried asking
> > for help elsewhere but some far I've not heard back.
> >
> > The reason I am asking here now is that the node that gets fenced
> > shows this in it's logs several times just before going down:
> >
> > Oct 31 00:27:21 vsh02 openais[3133]: [TOTEM] FAILED TO RECEIVE
> > Oct 31 00:27:21 vsh02 openais[3133]: [TOTEM] entering GATHER state from 6.
> >
> > On the surviving node I see this:
> >
> > Oct 31 00:35:47 vsh03 openais[3237]: [TOTEM] The token was lost in the
> > OPERATIONAL state.
> > Oct 31 00:35:47 vsh03 openais[3237]: [TOTEM] Receive multicast socket
> > recv buffer size (288000 bytes).
> > Oct 31 00:35:47 vsh03 openais[3237]: [TOTEM] Transmit multicast socket
> > send buffer size (262142 bytes).
> > Oct 31 00:35:47 vsh03 openais[3237]: [TOTEM] entering GATHER state from 2.
> > Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] entering GATHER state from 0.
> > Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] Creating commit token
> > because I am the rep.
> > Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] Saving state aru 2c high
> > seq received 2c
> > Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] Storing new sequence id for
> > ring 108
> > Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] entering COMMIT state.
> > Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] entering RECOVERY state.
> > Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] position [0] member
> > 10.255.135.3:
> > Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] previous ring seq 260 rep
> > 10.255.135.2
> > Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] aru 2c high delivered 2c
> > received flag 1
> > Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] Did not need to originate
> > any messages in recovery.
> > Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] Sending initial ORF token
> > Oct 31 00:35:51 vsh03 openais[3237]: [CLM ] CLM CONFIGURATION CHANGE
> > Oct 31 00:35:51 vsh03 openais[3237]: [CLM ] New Configuration:
> > Oct 31 00:35:51 vsh03 kernel: dlm: closing connection to node 1
> > Oct 31 00:35:51 vsh03 fenced[3256]: vsh02.domain.com not a cluster
> > member after 0 sec post_fail_delay
> > Oct 31 00:35:51 vsh03 openais[3237]: [CLM ] r(0) ip(10.255.135.3)
> > Oct 31 00:35:51 vsh03 fenced[3256]: fencing node "vsh02.domain.com"
> >
> > This happens when I put eth0 under Xen's management. The nodes will
> > keep fencing until DRBD breaks. At that point, the fencing stops and
> > everything seems to be fine. However, once I fix the DRBD partition and
> > try to look at the LVM the above errors return and I'm right back into a
> > fence loop until DRBD breaks again.
> >
> > Even a pointer to where I can learn more about AIS/TOTEM so that I
> > can try to understand what's going on would be awesome. I'm really stuck
> > on this error...
> >
> > Thanks!
> >
> > Madi
> > _______________________________________________
> > Openais mailing list
> > Openais at lists.linux-foundation.org
> > https://lists.linux-foundation.org/mailman/listinfo/openais
> >
>
> _______________________________________________
> Openais mailing list
> Openais at lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/openais
More information about the Openais
mailing list