[Openais] corosync ring marked FAULTY - administrative intervention required
Vadym Chepkov
vchepkov at gmail.com
Fri Apr 9 04:45:08 PDT 2010
Hi,
I experience this issue on every cluster I have, not just this one, so it could be a common misconfiguration on my part.
I am using the latest version of the corosync:
corosync-1.2.1-1.el5
Here is my config:
compatibility: none
aisexec {
user: root
group: root
}
service {
name: pacemaker
ver: 0
}
totem {
version: 2
token: 5000
token_retransmits_before_loss_const: 20
join: 1000
consensus: 7500
vsftype: none
max_messages: 20
secauth: off
threads: 0
clear_node_high_bit: yes
rrp_mode: passive
interface {
ringnumber: 0
broadcast: yes
bindnetaddr: 10.0.0.0
mcastport: 5405
}
interface {
ringnumber: 1
broadcast: yes
bindnetaddr: 207.207.163.0
mcastport: 5406
}
}
logging {
fileline: off
to_stderr: no
to_syslog: yes
debug: on
timestamp: on
}
amf {
mode: disabled
}
[root at xen-11 ~]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:30:48:62:4E:DC
inet addr:207.207.163.11 Bcast:207.207.163.255 Mask:255.255.255.0
inet6 addr: fe80::230:48ff:fe62:4edc/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2009418 errors:0 dropped:0 overruns:0 frame:0
TX packets:799835 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1428434820 (1.3 GiB) TX bytes:664164837 (633.3 MiB)
eth1 Link encap:Ethernet HWaddr 00:30:48:62:4E:DD
inet addr:10.0.0.1 Bcast:10.0.0.3 Mask:255.255.255.252
inet6 addr: fe80::230:48ff:fe62:4edd/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:4233811 errors:0 dropped:0 overruns:0 frame:0
TX packets:14118095 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:518593446 (494.5 MiB) TX bytes:14199338528 (13.2 GiB)
Memory:d8060000-d8080000
[root at xen-12 ~]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:30:48:62:4C:CA
inet addr:207.207.163.12 Bcast:207.207.163.255 Mask:255.255.255.0
inet6 addr: fe80::230:48ff:fe62:4cca/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1210002 errors:0 dropped:0 overruns:0 frame:0
TX packets:473204 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:698444593 (666.0 MiB) TX bytes:1145344594 (1.0 GiB)
eth1 Link encap:Ethernet HWaddr 00:30:48:62:4C:CB
inet addr:10.0.0.2 Bcast:10.0.0.3 Mask:255.255.255.252
inet6 addr: fe80::230:48ff:fe62:4ccb/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:13776771 errors:0 dropped:0 overruns:0 frame:0
TX packets:4008079 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:14138136203 (13.1 GiB) TX bytes:493569061 (470.7 MiB)
Memory:d8060000-d8080000
Cross-over connection on eth1
I don't see much of details in message log, probably need to increase debug level
[root at xen-12 ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 33554442
RING ID 0
id = 10.0.0.2
status = ring 0 active with no faults
RING ID 1
id = 207.207.163.12
status = Marking seqid 6594 ringid 1 interface 207.207.163.12 FAULTY - adminisrtative intervention required.
I can reset it just fine
[root at xen-12 ~]# corosync-cfgtool -r
Re-enabling all failed rings.
[root at xen-12 ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 33554442
RING ID 0
id = 10.0.0.2
status = ring 0 active with no faults
RING ID 1
id = 207.207.163.12
status = ring 1 active with no faults
But it goes into FAULTY mode almost right away:
Apr 9 11:40:56 xen-12 corosync[13835]: [TOTEM ] Marking seqid 18340 ringid 1 interface 207.207.163.12 FAULTY - adminisrtative intervention required.
that's the only message from the corosync in the log
Thank you,
Vadym Chepkov
More information about the Openais
mailing list