[Openais] Re: evt update for retained event recovery on config change

Mark Haverkamp markh at osdl.org
Fri Sep 24 15:17:18 PDT 2004


On Fri, 2004-09-24 at 14:43, Steven Dake wrote:
> Mark & Daniel,
> 
> Clearly we are getting to the hard part (minus the low level
> communication of course:-) of implementing the AIS specification...  I
> think this is an excellent first shot at merge recovery.  I have a few
> comments:
> 
> name_match is already implemented.  Use SaNameTisNameT.  Feel free to
> rename it to non-spazmodic style if you desire.  When I started the ais
> code, I thought the executive should follow the ais coding style, but I
> have changed my mind to something more like the kernel coding style.  I
> want to retain it for the lib directory to help debuggability though.
> I think parse.c is the correct place for this function today.  We probably
> want a util.c file to store our time related stuff and comparison
> operations.

Do you want a new comparison function in util.c or move the existing
one? If you like, I can create a util.c and start with the match and
time functions.

> 
> I have a feeling we could genericize the hashing of the node data
> structure and add it somehow to exec/clm.c.  This is not a high priority
> at the omment, perhaps something we can address next year.
> 
> Your approach to distributing events is clever (selecting the oldest
> boot time).
> 
> Would it help to know the previous configuration of every new member in
> the configuration change?  This way, each configuration could select one
> member from the old synchronized set to synchronize its events to the
> new configuration.  I had kicked this idea around for checkpointing
> sync, but haven't got to it yet.

This is what Daniel and I have been kicking around.  We thought that
keeping track of a previous config would allow oldest nodes from each
partition to distribute retained events from their partitions and get a
more accurate distribution of retained events.

> 
> What happens if a retained event is expired during a configuration
> change?  I know this window is small, but next_rtained at line 805 or so
> may point to a deleted event and cause a segfault or some other
> undefined behavior?  I need to come up for a solution for checkpointing
> too, so an exchange would be helpful on this subject.
> 
> at line 2169, do you still see this happening?  I think this can still
> happen but I'm not sure.  The changes to ensure we avoid it are not very
> elegant...

It can definitely happen, but I take that into account in the retained
events expire code.  If I'm deleting the next_retained event, I fix the
next_retained pointer at that time.

> 
> The rest looks good.
> 
> I'll commit the token callback changes (minus the test code) for the
> foundation.  The rest of your patch should be committed with removal of
> the duplicated function in the first comment above...  I'm not sure you
> fully have implemented expiration of retained events yet..  So perhaps
> the work relating to recovery of expired events can wait for another
> commit.
> 
> We should talk more about the previous configuration idea if your
> interested.

Yup.



Let me know what you want in the util.c file and I'll update my code for
the new match and time functions that I'll place there and send the
patch out one more time.

Thanks,
Mark.
-- 
Mark Haverkamp <markh at osdl.org>




More information about the Openais mailing list