[Openais] Configuring multiple interfaces in Corosync
JiaQiang Xu
xjqkilling at gmail.com
Wed Mar 9 00:14:29 PST 2011
2011/2/25 Steven Dake <sdake at redhat.com>:
> On 02/25/2011 01:48 AM, JiaQiang Xu wrote:
>> 2011/2/24 Steven Dake <sdake at redhat.com>:
>>> redundant ring is completely untested with udpu. I would focus on
>>> getting udpu working first and go from there.
>>>
>>> passive offers better performance, active consumes more cpu with
>>> slightly lower latency.
>>>
>>
>> I did some tests on udpu and rrp mode. Here is my findings.
>>
>> First I tested 2 interfaces with rrp_mode=active. Here is my config on
>> one of the test nodes:
>>
>> interface {
>> member {
>> memberaddr: 192.168.1.3
>> }
>> member {
>> memberaddr: 192.168.1.4
>> }
>> ringnumber: 0
>> bindnetaddr: 192.168.1.3
>> mcastport: 4000
>> }
>> interface {
>> member {
>> memberaddr: 192.168.2.3
>> }
>> member {
>> memberaddr: 192.168.2.4
>> }
>> ringnumber: 1
>> bindnetaddr: 192.168.2.3
>> mcastport: 3000
>> }
>> transport: udpu
>>
>> Seems in most cases they work smoothly together. But something
>> bad happens when I disable and re-enable one of the two physical interfaces.
>> After re-enabling the if, corosync sometimes crashes with the following message:
>>
>> corosync: totemsrp.c:1194: memb_consensus_agreed: Assertion
>> `token_memb_entries >= 1' failed.
>>
>> (I did not forget to run "corosync-cfgtool -r" after re-enabling the interface.)
>>
>> This bug still exists even if I set rrp_mode=none and config only one interface.
>> So I think this bug is not related to the integration of rrp and udpu.
>> If I use udp multicast instead, this problem disappears.
>> It may be a bug in the udpu code.
>>
>> I also found another bug (I believe) related to udpu:
>> I configure one udpu interface, without rrp on 2 nodes.
>> After a regular startup, crm_mon outputs on node 1:
>>
>> ============
>> Last updated: Fri Feb 25 16:43:31 2011
>> Stack: openais
>> Current DC: ubuntu-1 - partition with quorum
>> Version: 1.0.9-da7075976b5ff0bee71074385f8fd02f296ec8a3
>> 2 Nodes configured, 2 expected votes
>> 0 Resources configured.
>> ============
>>
>> Online: [ ubuntu-1 ubuntu-2 ]
>>
>> Then I manually disable the net if. crm_mon outputs on node 1 *doesn't change*.
>> While on node 2, we have:
>>
>> ============
>> Last updated: Fri Feb 25 16:43:29 2011
>> Stack: openais
>> Current DC: ubuntu-2 - partition WITHOUT quorum
>> Version: 1.0.9-da7075976b5ff0bee71074385f8fd02f296ec8a3
>> 2 Nodes configured, 2 expected votes
>> 0 Resources configured.
>> ============
>>
>> Node ubuntu-1: UNCLEAN (offline)
>> Online: [ ubuntu-2 ]
>>
>> This result is inconsistent with the result I get from udp multicast
>> configuration.
>> Seems the node who loses a connection fails to update the membership with udpu.
>>
>
> Thanks for the testing results. One thing I noticed is you said you
> disabled the interface. Does this mean you did ifcnonfig eth down? See
> http://www.corosync.org/doku.php?id=faq:ifdown. An ifdown operation in
> redunddant ring has unknown non-deterministic results. Could you retest
> using iptables to do the fencing operation rather then ifconfig? Then
> we can get some bugs filed and know where to look.
Thanks for your suggestion. I retest using iptables and all the above "bugs"
disappear.
Surprisingly, the rings also recover automatically when I unblock
the packets from the peer.
BTW, why does Corosync have non-deterministic behaviors when I if-down
a physical interface?
>
>> BTW, what's your plan for testing udpu with rrp mode?
>>
>
> I personally would like to get redundant ring working well first
> including things like automatic ring recovery. Unfortunately while alot
> of people have interest in using redundant ring, not alot of people have
> interest in working on the related code and fixing problems with it. At
> this point, redundant ring (and udpu integration thereof) is at the
> bottom of my personal TODO list. But anyone else is free to work on the
> code and supply patches.
>
> In most use cases, bonding works appropriately and is what I recommend
> for deployments.
More information about the Openais
mailing list