[Openais] Checkpoint crash in aisexec

Kristen Smith kjsmith at nortel.com
Fri Feb 11 13:55:25 PST 2005


Steve,

We are periodically seeing aisexec crash with the following trace:

	(gdb) bt
	#0  message_handler_req_lib_ckpt_checkpointclose (conn_info=0x0,
message=0xb73fc008) at ckpt.c:1552
	#1  0x080494c2 in poll_handler_libais_deliver (handle=0, fd=3,
revent=134633824, data=0x89c2ad8,
	    prio=0x89b2784) at main.c:578
	#2  0x08056e62 in poll_run (handle=0) at aispoll.c:386
#3  0x080499ac in main (argc=1, argv=0xbfffcb64) at main.c:1003

We have looked through the code but can't seem to figure out how conn_info
is getting set to 0. Do you have any idea under what circumstances conn_info
could be null when this function is called?

This is happening when we have multiple nodes up and we kill one of the
active nodes. The standby node (which was reading checkpoints) must now
become a writer, so it closes the checkpoint and this happens.
Unfortunately, I can't reproduce this consistently - I finally got a core
dump today. I don't recall ever seeing this with the old code.

Thanks,
Kristen

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.linux-foundation.org/pipermail/openais/attachments/20050211/df1e920f/attachment-0001.htm


More information about the Openais mailing list