[Openais] aisexec core dump during traffic

Kristen Smith kjsmith at nortel.com
Tue Feb 15 06:57:34 PST 2005


Bug 256 has been filed.

Also - the value of the memb_state global:

(gdb) p memb_state
$1 = MEMB_STATE_OPERATIONAL

-----Original Message-----
From: Steven Dake [mailto:sdake at mvista.com] 
Sent: Monday, February 14, 2005 2:06 PM
To: Smith, Kristen [NGC:B675:EXCH]
Cc: 'openais at lists.osdl.org'; Bajpai, Muni [NGC:B670:EXCH]
Subject: Re: [Openais] aisexec core dump during traffic


Kristen,

Ok I think I have duplicated this in the past but don't have an immediate
solution.  Basically what happens is that during recovery the token is lost,
which transitions back to gather.  Then in gather, the processor may
multicast messages which queues new messages in place of the old ones.  This
results in the fault you see.

You could have another error; its difficult to say.  could you print the
memb_state global variable in gdb?

Please file a defect on this one so we can track it.

Thanks
-steve

On Sun, 2005-02-13 at 08:25, Kristen Smith wrote:
> Steve,
> 
> Running traffic this weekend (in a 3+1 configuration - each of the 
> active nodes were writing out ~6/ckpts/second). Ran for about 20 hours 
> and then got the following from aisexec (on of the active nodes):
> 
> aisexec: ../include/sq.h:102: sq_item_add: Assertion 
> `sq->items_inuse[sq_position] == 0' failed.
> 
> and a trace:
> 
> #0  0x00bebcdf in raise () from /lib/tls/libc.so.6
> #1  0x00bed4e5 in abort () from /lib/tls/libc.so.6
> #2  0x00be5609 in __assert_fail () from /lib/tls/libc.so.6
> #3  0x0805add1 in orf_token_mcast (token=0xbfffce00, 
> fcc_mcasts_allowed=29, system_from=0xbfffd420)
>     at totemsrp.c:1990
> #4  0x080587e6 in message_handler_orf_token (system_from=0xbfffd420, 
> iovec=0xbfffce00, iov_len=1,
>     bytes_received=78, endian_conversion_needed=0) at totemsrp.c:2702 
> #5  0x0805a3d9 in recv_handler (handle=0, fd=7, revents=1, data=0x0,
> prio=0x0) at totemsrp.c:3351
> #6  0x08056e62 in poll_run (handle=0) at aispoll.c:386
> #7  0x080499ac in main (argc=1, argv=0xbfffd634) at main.c:1003
> 
> This is the bitkeeper code from last Monday.
> 
> Here are the #defines I have changed, if that matters at all:
> 
> #define TIMEOUT_STATE_GATHER_JOIN               40
> #define TIMEOUT_STATE_GATHER_CONSENSUS  80
> #define TIMEOUT_TOKEN                                      180
> #define TIMEOUT_TOKEN_RETRANSMIT                30
> 
> Any other information I can provide for you?
> 
> Thanks,
> Kristen
> 
> 
> 
> ______________________________________________________________________
> _______________________________________________
> Openais mailing list
> Openais at lists.osdl.org http://lists.osdl.org/mailman/listinfo/openais


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.linux-foundation.org/pipermail/openais/attachments/20050215/106c05d6/attachment-0001.htm


More information about the Openais mailing list