[Openais] corosync 1.2.5 still doesn't shutdown properly

Steven Dake sdake at redhat.com
Wed Jun 23 10:39:56 PDT 2010


On 06/22/2010 11:22 PM, Alain.Moulle wrote:
> Hi,
> With whatever release (i.e. currently with corosync-1.2.1-2.el6.x86_64),
> I always have trouble with the stop of corosync. And each
> time it failed when there were some failed actions reported
> by crm_mon.
> Regards
> Alain

Please give 1.2.5 a try.  I am not familiar with crm warnings triggering 
shutdown failures, but I can't make corosync+pacmekaer lockup during 
startup/shutdown for 2k iterations on single cpu or multi cpu.

Regards
-steve

>> On 06/22/2010 03:56 AM, Vadym Chepkov wrote:
>>> >  Hi,
>>> >
>>> >  I decided to check if I can start using corosync again on several of
>>> >  my clusters (have to use heartbeat there at the moment).
>>> >  I don't even have any services defined in corosync.conf, commented
>>> >  pacemaker out, just plain corosync and it never goes down:
>>> >
>>> >  # ps axf|grep corosync
>>> >  26294 pts/0    S+     0:00  |               \_ /bin/sh /sbin/service
>>> >  corosync restart
>>> >  26299 pts/0    S+     0:01  |                   \_ /bin/bash
>>> >  /etc/init.d/corosync restart
>>> >  29249 pts/1    S+     0:00                  \_ grep corosync
>>> >  25959 ?        Ssl    0:00 corosync
>>> >
>>> >
>>> >  I attached to the process and this is where it hangs:
>>> >
>>> >  (gdb) where
>>> >  #0  0x0fe14134 in poll () from /lib/libc.so.6
>>> >  #1  0x0ffbc530 in poll_run (handle=150346236434579456) at coropoll.c:413
>>> >  #2  0x10006e50 in main (argc=<value optimized out>, argv=<value
>>> >  optimized out>) at main.c:1576
>>> >
>>> >  How can I help to debug this problem?
>>> >  It is 100% reproducible.
>>> >
>>> >  Thank you,
>>> >  Vadym
>>> >  ________
>>
>> Vadym,
>>
>> Thanks for the feedback.  I do test this scenario and it works for me:
>>
>> [root at cast flatiron]# service corosync start
>> Starting Corosync Cluster Engine (corosync):               [  OK  ]
>> [root at cast flatiron]# service corosync restart
>> Signaling Corosync Cluster Engine (corosync) to terminate: [  OK  ]
>> Waiting for corosync services to unload:.                  [  OK  ]
>> Starting Corosync Cluster Engine (corosync):               [  OK  ]
>> [root at cast flatiron]# service corosync stop
>> Signaling Corosync Cluster Engine (corosync) to terminate: [  OK  ]
>> Waiting for corosync services to unload:.                  [  OK  ]
>> [root at cast flatiron]# service corosync start
>> Starting Corosync Cluster Engine (corosync):               [  OK  ]
>> [root at cast flatiron]# /etc/init.d/corosync restart
>> Signaling Corosync Cluster Engine (corosync) to terminate: [  OK  ]
>> Waiting for corosync services to unload:.                  [  OK  ]
>> Starting Corosync Cluster Engine (corosync):               [  OK  ]
>>
>>
>> One thing that would stop corosync from shutting down is if it couldn't
>> enter operational state.  This often happens because of a firewall
>> enabled on the ports corosync uses to communicate.
>>
>> The system logs would be helpful (with debug: on).
>>
>> Regards
>> -steve
>
>
>
> _______________________________________________
> Openais mailing list
> Openais at lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/openais



More information about the Openais mailing list