[Openais] corosync ring marked FAULTY - administrative intervention required

Steven Dake sdake at redhat.com
Mon Apr 12 07:56:11 PDT 2010


On Mon, 2010-04-12 at 05:50 -0700, Vadym Chepkov wrote:
> --- On Fri, 4/9/10, Steven Dake <sdake at redhat.com> wrote:
> 
> > 
> > Broadcast and redundant ring probably don't work to well
> > together.  If
> > you really want to use broadcast, take care to insure port
> > numbers are
> > separated by 2.  In your config, your using port 5405
> > for one ring and
> > 5406 for another.  Internally totem will use 5405+5404
> > for one ring, and
> > 5405+5406 for another.  With multicast this isn't a
> > problem since you
> > could use different multicast addresses.  With
> > brodcast, this is not the
> > case.
> > 
> > Try fixing that and report back if it helps.  If not
> > we can further
> > investigate.
> > 
> > Regards
> > -steve
> > 
> 
> I have changed the ports and it did help, thank you. The reason I was using broadcast is because my second ring is a cross-over cable. I wasn't sure if multicast makes any sense on such interface. Also I didn't know if I can have one redundant ring with multicast and another with broadcast. I would really like to know how an expert would configure corosync in my setup (two nodes, two ethernet cards each, connected to common switch and crossover-link between).
> 
> Thank you,
> Vadym
> 

Multicast should work ok if your switch works properly.

I'd recommend using multicast if you can, even with a crossover cable.

Regards
-steve

> > 
> > > corosync-1.2.1-1.el5
> > > 
> > > Here is my config:
> > > 
> > > compatibility: none
> > > 
> > > aisexec {
> > >     
> >    user:   root
> > >         group: 
> > root
> > > }
> > > 
> > > service {
> > >         name: pacemaker
> > >         ver:  0
> > > }
> > > 
> > > totem {
> > >         version: 2
> > >         token: 5000
> > >     
> >    token_retransmits_before_loss_const: 20
> > >         join: 1000
> > >         consensus: 7500
> > >         vsftype: none
> > >         max_messages:
> > 20
> > >         secauth: off
> > >         threads: 0
> > >     
> >    clear_node_high_bit: yes
> > >         rrp_mode:
> > passive
> > >         interface {
> > >             
> >    ringnumber: 0
> > >             
> >    broadcast: yes
> > >             
> >    bindnetaddr: 10.0.0.0
> > >             
> >    mcastport: 5405
> > >         }
> > >         interface {
> > >             
> >    ringnumber: 1
> > >             
> >    broadcast: yes
> > >             
> >    bindnetaddr: 207.207.163.0
> > >             
> >    mcastport: 5406
> > >         }
> > > }
> > > 
> > > logging {
> > >         fileline: off
> > >         to_stderr: no
> > >         to_syslog: yes
> > >         debug: on
> > >         timestamp: on
> > > }
> > > 
> > > amf {
> > >         mode: disabled
> > > }
> > > 
> > > [root at xen-11 ~]# ifconfig 
> > > eth0      Link encap:Ethernet 
> > HWaddr 00:30:48:62:4E:DC  
> > >           inet
> > addr:207.207.163.11  Bcast:207.207.163.255 
> > Mask:255.255.255.0
> > >           inet6
> > addr: fe80::230:48ff:fe62:4edc/64 Scope:Link
> > >           UP
> > BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> > >           RX
> > packets:2009418 errors:0 dropped:0 overruns:0 frame:0
> > >           TX
> > packets:799835 errors:0 dropped:0 overruns:0 carrier:0
> > >       
> >    collisions:0 txqueuelen:0 
> > >           RX
> > bytes:1428434820 (1.3 GiB)  TX bytes:664164837 (633.3
> > MiB)
> > > 
> > > eth1      Link encap:Ethernet 
> > HWaddr 00:30:48:62:4E:DD  
> > >           inet
> > addr:10.0.0.1  Bcast:10.0.0.3 
> > Mask:255.255.255.252
> > >           inet6
> > addr: fe80::230:48ff:fe62:4edd/64 Scope:Link
> > >           UP
> > BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> > >           RX
> > packets:4233811 errors:0 dropped:0 overruns:0 frame:0
> > >           TX
> > packets:14118095 errors:0 dropped:0 overruns:0 carrier:0
> > >       
> >    collisions:0 txqueuelen:1000 
> > >           RX
> > bytes:518593446 (494.5 MiB)  TX bytes:14199338528 (13.2
> > GiB)
> > >       
> >    Memory:d8060000-d8080000 
> > > 
> > > [root at xen-12 ~]# ifconfig 
> > > eth0      Link encap:Ethernet 
> > HWaddr 00:30:48:62:4C:CA  
> > >           inet
> > addr:207.207.163.12  Bcast:207.207.163.255 
> > Mask:255.255.255.0
> > >           inet6
> > addr: fe80::230:48ff:fe62:4cca/64 Scope:Link
> > >           UP
> > BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> > >           RX
> > packets:1210002 errors:0 dropped:0 overruns:0 frame:0
> > >           TX
> > packets:473204 errors:0 dropped:0 overruns:0 carrier:0
> > >       
> >    collisions:0 txqueuelen:0 
> > >           RX
> > bytes:698444593 (666.0 MiB)  TX bytes:1145344594 (1.0
> > GiB)
> > > 
> > > eth1      Link encap:Ethernet 
> > HWaddr 00:30:48:62:4C:CB  
> > >           inet
> > addr:10.0.0.2  Bcast:10.0.0.3 
> > Mask:255.255.255.252
> > >           inet6
> > addr: fe80::230:48ff:fe62:4ccb/64 Scope:Link
> > >           UP
> > BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> > >           RX
> > packets:13776771 errors:0 dropped:0 overruns:0 frame:0
> > >           TX
> > packets:4008079 errors:0 dropped:0 overruns:0 carrier:0
> > >       
> >    collisions:0 txqueuelen:1000 
> > >           RX
> > bytes:14138136203 (13.1 GiB)  TX bytes:493569061 (470.7
> > MiB)
> > >       
> >    Memory:d8060000-d8080000 
> > > 
> > > Cross-over connection on eth1
> > > 
> > > I don't see much of details  in message log,
> > probably need to increase debug level
> > > 
> > > [root at xen-12 ~]# corosync-cfgtool -s
> > > Printing ring status.
> > > Local node ID 33554442
> > > RING ID 0
> > >     id    = 10.0.0.2
> > >     status    = ring 0
> > active with no faults
> > > RING ID 1
> > >     id    =
> > 207.207.163.12
> > >     status    = Marking
> > seqid 6594 ringid 1 interface 207.207.163.12 FAULTY -
> > adminisrtative intervention required.
> > > 
> > > 
> > > I can reset it just fine
> > > 
> > > [root at xen-12 ~]# corosync-cfgtool -r
> > > Re-enabling all failed rings.
> > > [root at xen-12 ~]# corosync-cfgtool -s
> > > Printing ring status.
> > > Local node ID 33554442
> > > RING ID 0
> > >     id    = 10.0.0.2
> > >     status    = ring 0
> > active with no faults
> > > RING ID 1
> > >     id    =
> > 207.207.163.12
> > >     status    = ring 1
> > active with no faults
> > > 
> > > But it goes into FAULTY mode almost right away:
> > > 
> > > Apr  9 11:40:56 xen-12
> > corosync[13835]:   [TOTEM ] Marking seqid
> > 18340 ringid 1 interface 207.207.163.12 FAULTY -
> > adminisrtative intervention required.
> > > 
> > > that's the only message from the corosync in the log
> > > 
> > > Thank you,
> > > Vadym Chepkov
> > > 
> > > _______________________________________________
> > > Openais mailing list
> > > Openais at lists.linux-foundation.org
> > > https://lists.linux-foundation.org/mailman/listinfo/openais
> > 
> > _______________________________________________
> > Openais mailing list
> > Openais at lists.linux-foundation.org
> > https://lists.linux-foundation.org/mailman/listinfo/openais
> > 



More information about the Openais mailing list