[Openais] Checkpoint crash in aisexec
Kristen Smith
kjsmith at nortel.com
Fri Feb 11 13:55:25 PST 2005
Steve,
We are periodically seeing aisexec crash with the following trace:
(gdb) bt
#0 message_handler_req_lib_ckpt_checkpointclose (conn_info=0x0,
message=0xb73fc008) at ckpt.c:1552
#1 0x080494c2 in poll_handler_libais_deliver (handle=0, fd=3,
revent=134633824, data=0x89c2ad8,
prio=0x89b2784) at main.c:578
#2 0x08056e62 in poll_run (handle=0) at aispoll.c:386
#3 0x080499ac in main (argc=1, argv=0xbfffcb64) at main.c:1003
We have looked through the code but can't seem to figure out how conn_info
is getting set to 0. Do you have any idea under what circumstances conn_info
could be null when this function is called?
This is happening when we have multiple nodes up and we kill one of the
active nodes. The standby node (which was reading checkpoints) must now
become a writer, so it closes the checkpoint and this happens.
Unfortunately, I can't reproduce this consistently - I finally got a core
dump today. I don't recall ever seeing this with the old code.
Thanks,
Kristen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.linux-foundation.org/pipermail/openais/attachments/20050211/df1e920f/attachment-0001.htm
More information about the Openais
mailing list