[Openais] corosync 1.2.5 still doesn't shutdown properly

Steven Dake sdake at redhat.com
Tue Jun 22 11:21:20 PDT 2010


On 06/22/2010 11:07 AM, Vadym Chepkov wrote:
> On Tue, Jun 22, 2010 at 1:49 PM, Steven Dake<sdake at redhat.com>  wrote:
>> On 06/22/2010 03:56 AM, Vadym Chepkov wrote:
>>>
>>> Hi,
>>>
>>> I decided to check if I can start using corosync again on several of
>>> my clusters (have to use heartbeat there at the moment).
>>> I don't even have any services defined in corosync.conf, commented
>>> pacemaker out, just plain corosync and it never goes down:
>>>
>>> # ps axf|grep corosync
>>> 26294 pts/0    S+     0:00  |               \_ /bin/sh /sbin/service
>>> corosync restart
>>> 26299 pts/0    S+     0:01  |                   \_ /bin/bash
>>> /etc/init.d/corosync restart
>>> 29249 pts/1    S+     0:00                  \_ grep corosync
>>> 25959 ?        Ssl    0:00 corosync
>>>
>>>
>>> I attached to the process and this is where it hangs:
>>>
>>> (gdb) where
>>> #0  0x0fe14134 in poll () from /lib/libc.so.6
>>> #1  0x0ffbc530 in poll_run (handle=150346236434579456) at coropoll.c:413
>>> #2  0x10006e50 in main (argc=<value optimized out>, argv=<value
>>> optimized out>) at main.c:1576
>>>
>>> How can I help to debug this problem?
>>> It is 100% reproducible.
>>>
>>> Thank you,
>>> Vadym
>>> ________
>>
>> Vadym,
>>
>> Thanks for the feedback.  I do test this scenario and it works for me:
>>
>> [root at cast flatiron]# service corosync start
>> Starting Corosync Cluster Engine (corosync):               [  OK  ]
>> [root at cast flatiron]# service corosync restart
>> Signaling Corosync Cluster Engine (corosync) to terminate: [  OK  ]
>> Waiting for corosync services to unload:.                  [  OK  ]
>> Starting Corosync Cluster Engine (corosync):               [  OK  ]
>> [root at cast flatiron]# service corosync stop
>> Signaling Corosync Cluster Engine (corosync) to terminate: [  OK  ]
>> Waiting for corosync services to unload:.                  [  OK  ]
>> [root at cast flatiron]# service corosync start
>> Starting Corosync Cluster Engine (corosync):               [  OK  ]
>> [root at cast flatiron]# /etc/init.d/corosync restart
>> Signaling Corosync Cluster Engine (corosync) to terminate: [  OK  ]
>> Waiting for corosync services to unload:.                  [  OK  ]
>> Starting Corosync Cluster Engine (corosync):               [  OK  ]
>>
>>
>> One thing that would stop corosync from shutting down is if it couldn't
>> enter operational state.  This often happens because of a firewall enabled
>> on the ports corosync uses to communicate.
>>
>> The system logs would be helpful (with debug: on).
>>
>> Regards
>> -steve
>
>
> And it works fine on Intel based servers, but on Redhat PPC based
> server it doesn't
>
> I attached the config and the log file
>
> Thanks,
> Vadym

Nothing jumps out from the logs.  Thanks for the pointer about ppc. 
I'll hunt down some PPC hardware and see if I can reproduce/fix.  Could 
you be more specific about which ppc (32 or 64) you were running?  Where 
you running BE and LE in same cluster?

Please be patient, however.  I don't have any ppc hardware personally, 
and getting access to non-x86 hardware may take me a few days.

Regards
-steve


More information about the Openais mailing list