[Openais] Configuring multiple interfaces in Corosync

JiaQiang Xu xjqkilling at gmail.com
Wed Mar 9 00:14:29 PST 2011


2011/2/25 Steven Dake <sdake at redhat.com>:
> On 02/25/2011 01:48 AM, JiaQiang Xu wrote:
>> 2011/2/24 Steven Dake <sdake at redhat.com>:
>>> redundant ring is completely untested with udpu.  I would focus on
>>> getting udpu working first and go from there.
>>>
>>> passive offers better performance, active consumes more cpu with
>>> slightly lower latency.
>>>
>>
>> I did some tests on udpu and rrp mode. Here is my findings.
>>
>> First I tested 2 interfaces with rrp_mode=active. Here is my config on
>> one of the test nodes:
>>
>>       interface {
>>               member {
>>                       memberaddr: 192.168.1.3
>>               }
>>               member {
>>                       memberaddr: 192.168.1.4
>>               }
>>               ringnumber: 0
>>               bindnetaddr: 192.168.1.3
>>               mcastport: 4000
>>       }
>>       interface {
>>               member {
>>                       memberaddr: 192.168.2.3
>>               }
>>               member {
>>                       memberaddr: 192.168.2.4
>>               }
>>               ringnumber: 1
>>               bindnetaddr: 192.168.2.3
>>               mcastport: 3000
>>       }
>>       transport: udpu
>>
>> Seems in most cases they work smoothly together. But something
>> bad happens when I disable and re-enable one of the two physical interfaces.
>> After re-enabling the if, corosync sometimes crashes with the following message:
>>
>> corosync: totemsrp.c:1194: memb_consensus_agreed: Assertion
>> `token_memb_entries >= 1' failed.
>>
>> (I did not forget to run "corosync-cfgtool -r" after re-enabling the interface.)
>>
>> This bug still exists even if I set rrp_mode=none and config only one interface.
>> So I think this bug is not related to the integration of rrp and udpu.
>> If I use udp multicast instead, this problem disappears.
>> It may be a bug in the udpu code.
>>
>> I also found another bug (I believe) related to udpu:
>> I configure one udpu interface, without rrp on 2 nodes.
>> After a regular startup, crm_mon outputs on node 1:
>>
>> ============
>> Last updated: Fri Feb 25 16:43:31 2011
>> Stack: openais
>> Current DC: ubuntu-1 - partition with quorum
>> Version: 1.0.9-da7075976b5ff0bee71074385f8fd02f296ec8a3
>> 2 Nodes configured, 2 expected votes
>> 0 Resources configured.
>> ============
>>
>> Online: [ ubuntu-1 ubuntu-2 ]
>>
>> Then I manually disable the net if. crm_mon outputs on node 1 *doesn't change*.
>> While on node 2, we have:
>>
>> ============
>> Last updated: Fri Feb 25 16:43:29 2011
>> Stack: openais
>> Current DC: ubuntu-2 - partition WITHOUT quorum
>> Version: 1.0.9-da7075976b5ff0bee71074385f8fd02f296ec8a3
>> 2 Nodes configured, 2 expected votes
>> 0 Resources configured.
>> ============
>>
>> Node ubuntu-1: UNCLEAN (offline)
>> Online: [ ubuntu-2 ]
>>
>> This result is inconsistent with the result I get from udp multicast
>> configuration.
>> Seems the node who loses a connection fails to update the membership with udpu.
>>
>
> Thanks for the testing results.   One thing I noticed is you said you
> disabled the interface.  Does this mean you did ifcnonfig eth down?  See
> http://www.corosync.org/doku.php?id=faq:ifdown.  An ifdown operation in
> redunddant ring has unknown non-deterministic results.  Could you retest
> using iptables to do the fencing operation rather then ifconfig?  Then
> we can get some bugs filed and know where to look.

Thanks for your suggestion. I retest using iptables and all the above "bugs"
disappear.

Surprisingly, the rings also recover automatically when I unblock
the packets from the peer.

BTW, why does Corosync have non-deterministic behaviors when I if-down
a physical interface?

>
>> BTW, what's your plan for testing udpu with rrp mode?
>>
>
> I personally would like to get redundant ring working well first
> including things like automatic ring recovery.  Unfortunately while alot
> of people have interest in using redundant ring, not alot of people have
> interest in working on the related code and fixing problems with it.  At
> this point, redundant ring (and udpu integration thereof) is at the
> bottom of my personal TODO list.  But anyone else is free to work on the
> code and supply patches.
>
> In most use cases, bonding works appropriately and is what I recommend
> for deployments.


More information about the Openais mailing list