[Openais] recover from corosync daemon restart and cpg_finalize timing

Steven Dake sdake at redhat.com
Thu Jun 24 21:19:14 PDT 2010


On 06/24/2010 12:50 PM, dan clark wrote:
> Thank you for trying out this test.
>
> I have upgraded to release 1.2.5 and applied the fix posted for the
> leak to /dev/shm.  Unfortunately when I run the test application
> (slightly modified to fix a couple of bugs I found) I still find
> /dev/shm filling up with large files "control_buffer-xxx,
> dispatch_buffer-xxx, fdata-xxxx, request_buffer_xxx,
> response_buffer_xxx" even after corosync is restarted and the
> application daemon killed.  It would appear that there may still be a
> problem in the cleanup of the temporary files used by corosync
> (library and daemon?) in /dev/shm.
>

The typical posix model for shared memory cleanup works as follows:
process 1 (libcoroipcc) creates shared memory segments
process 1 tells coroysnc names of shared memory segments
process 2 (corosync) opens shared memory segments, unlinks them, then 
mmaps them

The unlink marks the file as oending deletion.  Once all processes stop 
using the file, (ie they close the file or the process crashes and the 
OS closes the file) the file will be garbage collected by the shared 
memory file system.  I believe what is happening in this case is either 
finalize is not called, or there is some error with finalize preventing 
the closing of the mmaped file segments.

I'll look into it early next week.  This is something we definitely 
would want to fix.

Shutdown/restart was not something we heavily focused on in early 
development.  My hope is that corosync isn't restarted but from the 
traffic on the list relating to pacemaker and other users, this may be 
nieve.  Our community has for the past 5-6 months been sorting out these 
restart/shutdown use cases.  At this point, I believe we have a very 
well thought out shutdown implementation - restart with existing clients 
less so.

> Should the shutdown of the application (and associated corosync
> library) cleanup the temporary files?  Should the shutdown of the
> daemon cleanup the /dev/shm temporary files?  Would a stop gap measure
> be to rm -f/dev/shm/* in the init.d script to cleanup any leftovers?
> Would that break the library if the applications were not also shut
> down?
>

I can't recommend rm -f /dev/shm.  It could potentially harm any 
application that uses shm_open posix call or relies on multi-process 
shared memory.

Give us a little time to look at it.

Regards
-steve
> dan
>



More information about the Openais mailing list