[Openais] corosync 1.3 crashes on totem loss

Dejan Muhamedagic dejan at suse.de
Tue Mar 15 08:27:38 PDT 2011


Hi,

On Tue, Mar 15, 2011 at 07:46:30AM -0700, Steven Dake wrote:
> On 03/14/2011 06:05 PM, AP wrote:
> > Hi,
> > 
> > Just had severe network flakyness here and found corosync vanishing from
> > the process list on one of nodes. Initially this was due to packet loss
> > but just now it was due to multicast not being enabled properly so that
> > the node in question could send multicast packets but not receive them.
> > 
> > Attached is the corosync-fplay output as well as a bt full of the core
> > file. The OS is Debian squeeze (libc 2.11.2), kernel 2.6.37.2.
> > 
> > AP
> > 
> > 
> > 
> > _______________________________________________
> > Openais mailing list
> > Openais at lists.linux-foundation.org
> > https://lists.linux-foundation.org/mailman/listinfo/openais
> 
> This bug is fixed in commit:

The assert looks like this one:

http://marc.info/?l=openais&m=129647667713161&w=2

Or is it that this patch fixes that one too?

Thanks,

Dejan

> commit 96fa74175b0efad6909bfff91f5948f4e8080768
> Author: Steven Dake <sdake at redhat.com>
> Date:   Fri Mar 4 12:55:54 2011 -0700
> 
>     Fix abort when token is lost in RECOVERY state
> 
>     A commit token should be rejected when a token is lost in the recovery
>     state.  This occurs naturally because the ring id increases by 4 for
>     every new ring.  Prior to this patch, if the token was lost, the old
>     ring id information was restored, causing a commit token to be accepted
>     when it should be rejected.  This erronously accepted commit token would
>     lead to an assertion which is fixed by this patch.
> 
>     Signed-off-by: Steven Dake <sdake at redhat.com>
>     Reviewed-by: Angus Salkeld <asalkeld at redhat.com>
> _______________________________________________
> Openais mailing list
> Openais at lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/openais


More information about the Openais mailing list