[Openais] Configuring multiple interfaces in Corosync

Steven Dake sdake at redhat.com
Wed Mar 9 00:46:27 PST 2011


On 03/09/2011 01:14 AM, JiaQiang Xu wrote:
> 2011/2/25 Steven Dake <sdake at redhat.com>:
>> On 02/25/2011 01:48 AM, JiaQiang Xu wrote:
>>> 2011/2/24 Steven Dake <sdake at redhat.com>:
>>>> redundant ring is completely untested with udpu.  I would focus on
>>>> getting udpu working first and go from there.
>>>>
>>>> passive offers better performance, active consumes more cpu with
>>>> slightly lower latency.
>>>>
>>>
>>> I did some tests on udpu and rrp mode. Here is my findings.
>>>
>>> First I tested 2 interfaces with rrp_mode=active. Here is my config on
>>> one of the test nodes:
>>>
>>>       interface {
>>>               member {
>>>                       memberaddr: 192.168.1.3
>>>               }
>>>               member {
>>>                       memberaddr: 192.168.1.4
>>>               }
>>>               ringnumber: 0
>>>               bindnetaddr: 192.168.1.3
>>>               mcastport: 4000
>>>       }
>>>       interface {
>>>               member {
>>>                       memberaddr: 192.168.2.3
>>>               }
>>>               member {
>>>                       memberaddr: 192.168.2.4
>>>               }
>>>               ringnumber: 1
>>>               bindnetaddr: 192.168.2.3
>>>               mcastport: 3000
>>>       }
>>>       transport: udpu
>>>
>>> Seems in most cases they work smoothly together. But something
>>> bad happens when I disable and re-enable one of the two physical interfaces.
>>> After re-enabling the if, corosync sometimes crashes with the following message:
>>>
>>> corosync: totemsrp.c:1194: memb_consensus_agreed: Assertion
>>> `token_memb_entries >= 1' failed.
>>>

disable with ifdown?  I can tell you for certain any ifdown/ifup of any
interface used in redundant ring will not work and will probably do
something you don't want (like crash).

This assertion is, however, known and understood to some degree but
there is currently no resolution.

>>> (I did not forget to run "corosync-cfgtool -r" after re-enabling the interface.)
>>>
>>> This bug still exists even if I set rrp_mode=none and config only one interface.
>>> So I think this bug is not related to the integration of rrp and udpu.
>>> If I use udp multicast instead, this problem disappears.
>>> It may be a bug in the udpu code.
>>>
>>> I also found another bug (I believe) related to udpu:
>>> I configure one udpu interface, without rrp on 2 nodes.
>>> After a regular startup, crm_mon outputs on node 1:
>>>
>>> ============
>>> Last updated: Fri Feb 25 16:43:31 2011
>>> Stack: openais
>>> Current DC: ubuntu-1 - partition with quorum
>>> Version: 1.0.9-da7075976b5ff0bee71074385f8fd02f296ec8a3
>>> 2 Nodes configured, 2 expected votes
>>> 0 Resources configured.
>>> ============
>>>
>>> Online: [ ubuntu-1 ubuntu-2 ]
>>>
>>> Then I manually disable the net if. crm_mon outputs on node 1 *doesn't change*.
>>> While on node 2, we have:
>>>
>>> ============
>>> Last updated: Fri Feb 25 16:43:29 2011
>>> Stack: openais
>>> Current DC: ubuntu-2 - partition WITHOUT quorum
>>> Version: 1.0.9-da7075976b5ff0bee71074385f8fd02f296ec8a3
>>> 2 Nodes configured, 2 expected votes
>>> 0 Resources configured.
>>> ============
>>>
>>> Node ubuntu-1: UNCLEAN (offline)
>>> Online: [ ubuntu-2 ]
>>>
>>> This result is inconsistent with the result I get from udp multicast
>>> configuration.
>>> Seems the node who loses a connection fails to update the membership with udpu.
>>>
>>
>> Thanks for the testing results.   One thing I noticed is you said you
>> disabled the interface.  Does this mean you did ifcnonfig eth down?  See
>> http://www.corosync.org/doku.php?id=faq:ifdown.  An ifdown operation in
>> redunddant ring has unknown non-deterministic results.  Could you retest
>> using iptables to do the fencing operation rather then ifconfig?  Then
>> we can get some bugs filed and know where to look.
> 
> Thanks for your suggestion. I retest using iptables and all the above "bugs"
> disappear.
> 
> Surprisingly, the rings also recover automatically when I unblock
> the packets from the peer.
> 
> BTW, why does Corosync have non-deterministic behaviors when I if-down
> a physical interface?
> 

Short answer is corosync is not yet perfect ;)

Longer answer with use cases is here:

http://www.corosync.org/doku.php?id=faq:ifdown

Both use cases need to be supported in a method that behaves
consistently.  If anyone wants to take on improving this behavior, I'd
definitely take patches.


>>
>>> BTW, what's your plan for testing udpu with rrp mode?
>>>
>>
>> I personally would like to get redundant ring working well first
>> including things like automatic ring recovery.  Unfortunately while alot
>> of people have interest in using redundant ring, not alot of people have
>> interest in working on the related code and fixing problems with it.  At
>> this point, redundant ring (and udpu integration thereof) is at the
>> bottom of my personal TODO list.  But anyone else is free to work on the
>> code and supply patches.
>>
>> In most use cases, bonding works appropriately and is what I recommend
>> for deployments.



More information about the Openais mailing list