[Openais] Re: ipc rewrite

Mark Haverkamp markh at osdl.org
Wed Apr 26 10:48:46 PDT 2006

On Tue, 2006-04-25 at 16:16 -0700, Steven Dake wrote:
> The events dropped might be because of priority inversion of the
> subscription and publish tests.  They should be set to sched-rr:1.  Look
> at evsbench.  Eventually this will be resolved in a later patch so that
> priorities are automatically determined.  Let me know what tests you are
> running to get the "lockup" and I'll see what is wrong with the ipc.
> evsbench seems to work properly which is the only way I tested this..
> What was the test case for the double free?
> With the new code, it will be difficult to run aisexec within gdb
> because the ipc code will often call pthread_kill to interrupt the poll
> when the outbound kernel queue is full (this interrupts gdb too sigh).
> I'd recommend ulimit -c unlimited to create core files and then use
> gdb ./aisexec corefile
> you can use thread 1, thread 2, etc to get to different threads and get
> backtraces.
> I realize this adds extra complication for the developers but it should
> pay off in the end.

I think I have a clue as to what is going on.  I added some debug in the
areas where events were queued for delivery and when they were requested
by the application.  It seems that somehow my event count variable is
getting out of sync with how many events are on the queue.  I see from
the stack trace that clone is called and the delivery function is called
by another thread.  I am guessing (since I don't have any mutexes in the
event code) that there are races now in the various event processing and
delivery functions that can cause inconsistencies in my data structures.
Does this sound reasonable?


Mark Haverkamp <markh at osdl.org>

More information about the Openais mailing list