[Openais] corosync 1.3 crashes on totem loss

Steven Dake sdake at redhat.com
Tue Mar 15 10:01:17 PDT 2011


On 03/15/2011 08:27 AM, Dejan Muhamedagic wrote:
> Hi,
> 
> On Tue, Mar 15, 2011 at 07:46:30AM -0700, Steven Dake wrote:
>> On 03/14/2011 06:05 PM, AP wrote:
>>> Hi,
>>>
>>> Just had severe network flakyness here and found corosync vanishing from
>>> the process list on one of nodes. Initially this was due to packet loss
>>> but just now it was due to multicast not being enabled properly so that
>>> the node in question could send multicast packets but not receive them.
>>>
>>> Attached is the corosync-fplay output as well as a bt full of the core
>>> file. The OS is Debian squeeze (libc 2.11.2), kernel 2.6.37.2.
>>>
>>> AP
>>>
>>>
>>>
>>> _______________________________________________
>>> Openais mailing list
>>> Openais at lists.linux-foundation.org
>>> https://lists.linux-foundation.org/mailman/listinfo/openais
>>
>> This bug is fixed in commit:
> 
> The assert looks like this one:
> 
> http://marc.info/?l=openais&m=129647667713161&w=2
> 
> Or is it that this patch fixes that one too?
> 

Your right, there was a FAILED TO RECV in AP's logs.  This is the bug
which is as yet unfixed but with a workaround of increasing seqno:

https://bugzilla.redhat.com/show_bug.cgi?id=671575


> Thanks,
> 
> Dejan
> 
>> commit 96fa74175b0efad6909bfff91f5948f4e8080768
>> Author: Steven Dake <sdake at redhat.com>
>> Date:   Fri Mar 4 12:55:54 2011 -0700
>>
>>     Fix abort when token is lost in RECOVERY state
>>
>>     A commit token should be rejected when a token is lost in the recovery
>>     state.  This occurs naturally because the ring id increases by 4 for
>>     every new ring.  Prior to this patch, if the token was lost, the old
>>     ring id information was restored, causing a commit token to be accepted
>>     when it should be rejected.  This erronously accepted commit token would
>>     lead to an assertion which is fixed by this patch.
>>
>>     Signed-off-by: Steven Dake <sdake at redhat.com>
>>     Reviewed-by: Angus Salkeld <asalkeld at redhat.com>
>> _______________________________________________
>> Openais mailing list
>> Openais at lists.linux-foundation.org
>> https://lists.linux-foundation.org/mailman/listinfo/openais
> _______________________________________________
> Openais mailing list
> Openais at lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/openais



More information about the Openais mailing list