[Openais] Re: Some event service code

Steven Dake sdake at mvista.com
Thu Jul 29 12:10:34 PDT 2004

On Thu, 2004-07-29 at 08:15, Mark Haverkamp wrote:
> On Wed, 2004-07-28 at 16:13, Steven Dake wrote:
> > Awesome.  Comments below...
> > 
> > On Wed, 2004-07-28 at 15:29, Mark Haverkamp wrote:
> > > Here is a patch for the event service.  The library side is pretty far
> > > along. The aisexec side has placeholders that just print something and
> > > return status.  I just wanted to run this by you for a review before I
> > > get too much farther into it.  I hope the coding style isn't too far
> > > off.  I've tried to reuse as much of my existing code as I could.  It's
> > > almost 2800 lines so rather than attach it, I put it here:
> > > 
> > > http://developer.osdl.org/markh/openais/openais_event_20040728.patch
> > > 
> > > Thanks,
> > > Mark.
> > 
> > Mark
> > 
> > Wow I must say I am impressed.  Excellent work !!
> > 
> > Thanks for the dependencies..  That was driving me crazy when I edited
> > the h files.
> > 
> > I do have a few comments of course :)
> > 
> > On the library:
> > 
> > I'd like the tabstops to look good at 4.  I think you are using tabstops
> > of 8.  This is a minor thing.  I think there is a way to tell vi to
> > force tabstops when it loads a file.  Do you know how this works?
> I've always used tabstops of 8.  I didn't notice that your code was set
> up for tabs at 4 since your code seems to look OK at 8. I don't know how
> to set the tabstops at file load.  I think that there were some kernel
> files that had some emacs comments to do that, I'm not sure about vi.
> It looks like the only thing that looks funny is that the structure
> declarations don't line up well at 4.    I'll run through the files and
> line things up. 
> > 
> > we could probably put clust_time_now in ais_util.h/.c
> Is that util.[ch] in the lib directory or new files?
> > 
> > In initialize, SaServiceConnect should free the queue allocated with
> > saQueueInit if there is an error.
> > 
> > In channel open, you don't do a handle put if handlecreate fails
> > 
> > in channel close, the first handle instance isn't put if the second
> > handle instance get fails.
> Thanks, I knew that I probably missed a few things like that here and
> there.
yes I still find after reading the amf/ckpt/clm error paths, errors in
handle puts.

> > 
> > All functions not specifically exported should be static: patt_size,
> > aispatt_to_evt_patt, filt_size, aisfilt_to_evt_filt
> Good idea.  I normally try to do this.
> > 
> > I know some of the code in clm, amf, checkpointing may not follow these
> > static guidelines, so if you see anything in any of the code, let me
> > know so I can fix it up.
> > 
> > It might help parallel style a little bit if the header names are
> > "header" vs ico_head, etc.  This definately isn't a big deal though.
> The thing that I like about the prefix names is that when I use cscope
> to browse the source code, I can easily distinguish different structure
> elements.

ok whatever helps you develop/maintain that code is best for us.

> > 
> > You will have to change your executive handlers to take the struct
> > in_addr parameter in the latest tree.
> OK.
> > 
> > One big problem to be avoided is duplicating services in several
> > places.  When I started openais some time ago, I had
> > amfComponentRegister, and there were two versions.  One for the library
> > which registered a component, and one for the executive.  This creates a
> > race condition.  If you send the message over the executive and handle
> > the request in one place "in your TODO: Add open code here" the race
> > conditions can be avoided  This can be applied all over the executive.
> Could you explain this again?  I'm not sure I understand what you are
> saying here.

Sorry I was in a hurry to teambuilding yesterday, so I wasn't clear.

Let me explain what happened in the past first (pre openais release).

Lets say that an AMF application wanted to register a component.  It
would call amfCompnoentRegister.  This would send a message to the
executive from the library.  The executive would get the request, and
process it in its library handler.  Then it would send the request to
every other node, which would process it in its executive handler.  I
use the term's library handler and executive handler loosely because the
code base prior to openais was not very well organized in this regard
(with a seperate service handler structure).  The code (for most of its
life) prior to openais also didn't self deliver messages.

So in this scenario, what happens if two processors register the same
component?  They can race for the registration.  In fact, it could end
up that each processor thinks it has registered the same component. 
This is bad news for any kind of state machine, since it violates the
states (unless of course you plan this to happen, then you can have a
state for it).

One nice advantage of the current gmi code is that it self-delivers
messages.  If you multicast a message, it will be delivered to everyone,
including the sender of the message.  This can be used to put the
"processing" of the operation in one place: the executive message
handler.  self delivery is pretty nice, because messages are self
delivered in agreed order.

This feature, combined with deferring responses, can be used to ensure
race-free registration of new components.  Of course, this concept
should be extended to other sorts of operations like creating channels,
deleting channels, changing timeouts of events, or whatever.

Now, in openais, component register, unregister, unlink etc works like
the app does a comp register, the library sends the request, the library
handler in the executive sends a new request to the gmi.  The gmi orders
the message and self-delivers it in agreed order.  Then the delivery
function calls the executive message handler for the operation, and the
operation is processed in one place.  This is the crux of what I meant
to say in my last message.

Because messages arrive in agreed order, even if two race, the first
request will be processed, and the second request will be rejected. 
(first and second being in what order the messages were ordered on the
ring).  bye bye race conditions.

This does require deferring response (to the library) to an executive
message which is covered in README.devmap.

There are optimizations that can be done to avoid the ordering overhead
(and bandwidth limits of the network) but be wary of the above
possibility.  I'd rather it work correctly all the time then work faster
incorrectly most of the time:)


> > 
> > If you can fix up the library comments I'll merge that as one patch, or
> > if you want to wait for the entire thing we can.
> I'll update the code and send out another patch for you to merge,
> probably this afternoon.  Then I'll start filling in the server side
> code and send updates as things get working.  If that sounds OK.
> > 

sounds good Mark

> > I'm looking forward to see the remainder of the executive service
> > handler.
> > 
> Thanks for the review.
> Mark.
> > Regards
> > -steve

More information about the Openais mailing list