[Openais] Re: evt update for retained event recovery on config change

Steven Dake sdake at mvista.com
Fri Sep 24 15:30:05 PDT 2004


On Fri, 2004-09-24 at 15:17, Mark Haverkamp wrote:
> On Fri, 2004-09-24 at 14:43, Steven Dake wrote:
> > Mark & Daniel,
> > 
> > Clearly we are getting to the hard part (minus the low level
> > communication of course:-) of implementing the AIS specification...  I
> > think this is an excellent first shot at merge recovery.  I have a few
> > comments:
> > 
> > name_match is already implemented.  Use SaNameTisNameT.  Feel free to
> > rename it to non-spazmodic style if you desire.  When I started the ais
> > code, I thought the executive should follow the ais coding style, but I
> > have changed my mind to something more like the kernel coding style.  I
> > want to retain it for the lib directory to help debuggability though.
> > I think parse.c is the correct place for this function today.  We probably
> > want a util.c file to store our time related stuff and comparison
> > operations.
> 
> Do you want a new comparison function in util.c or move the existing
> one? If you like, I can create a util.c and start with the match and
> time functions.
> 

sounds good mark

> > 
> > I have a feeling we could genericize the hashing of the node data
> > structure and add it somehow to exec/clm.c.  This is not a high priority
> > at the omment, perhaps something we can address next year.
> > 
> > Your approach to distributing events is clever (selecting the oldest
> > boot time).
> > 
> > Would it help to know the previous configuration of every new member in
> > the configuration change?  This way, each configuration could select one
> > member from the old synchronized set to synchronize its events to the
> > new configuration.  I had kicked this idea around for checkpointing
> > sync, but haven't got to it yet.
> 
> This is what Daniel and I have been kicking around.  We thought that
> keeping track of a previous config would allow oldest nodes from each
> partition to distribute retained events from their partitions and get a
> more accurate distribution of retained events.
> 

Cool since we are both in agreement it can be useful, we can add
something like this.  Since its friday, I'll have a look at this on
Monday or if one of you wants to take it on, let me know.

I need to get to the checkpointing merge recovery soon so we can get to
a release...

> > 
> > What happens if a retained event is expired during a configuration
> > change?  I know this window is small, but next_rtained at line 805 or so
> > may point to a deleted event and cause a segfault or some other
> > undefined behavior?  I need to come up for a solution for checkpointing
> > too, so an exchange would be helpful on this subject.
> > 
> > at line 2169, do you still see this happening?  I think this can still
> > happen but I'm not sure.  The changes to ensure we avoid it are not very
> > elegant...
> 
> It can definitely happen, but I take that into account in the retained
> events expire code.  If I'm deleting the next_retained event, I fix the
> next_retained pointer at that time.
> 

Ok I understand the mechanism now.

> > 
> > The rest looks good.
> > 
> > I'll commit the token callback changes (minus the test code) for the
> > foundation.  The rest of your patch should be committed with removal of
> > the duplicated function in the first comment above...  I'm not sure you
> > fully have implemented expiration of retained events yet..  So perhaps
> > the work relating to recovery of expired events can wait for another
> > commit.
> > 
> > We should talk more about the previous configuration idea if your
> > interested.
> 
> Yup.
> 
> 
> 
> Let me know what you want in the util.c file and I'll update my code for
> the new match and time functions that I'll place there and send the
> patch out one more time.
> 

sounds good

> Thanks,
> Mark.




More information about the Openais mailing list