[Openais] Re: Patch II design review revision 1 !!

Steven Dake sdake at mvista.com
Thu Feb 24 00:55:45 PST 2005


Muni,
My apologies for not responding earlier; I've been swawmped by my day
job.. :(
On Wed, 2005-02-23 at 13:18, Muni Bajpai wrote:
> Hey Steve,
> 
> So I have made the changes as itemized by you.
> Some Notes:
> 1.) Have removed saCkptCheckpointSection from
> req_exec_ckpt_synchronize_state and added the following
>         SaCkptSectionDescriptorT sectionDescriptor;
>         SaUint32T dataOffSet;
>         SaUint32T dataSize;

I think this makes sense, but probably as another message.  See later
comments.

> 2.) Could not think of any other way to do
> ckpt_checkpoint_remove_cleanup. Let me know if you have any
> suggestions
> 
hmm I think its ok for now we can always refine later.

> Thanks,
> 
> Muni
> P.S I haven't tested any of this and plan to do that as soon the I
> base changes for recovery are in. When do you estimate that to be
> merged in ?
> 

I am hopeful I will have something available Monday.  I'd suggest,
however, trying to test without the base recovery code by adding the
calls to the locations specified in previous emails.  This should
atleast allow you to start testing your implementation..

comments on patch:
findProcessorIndex style is still somewhat wacky and has a c++ comment

"//save off the elements" c++ comment should be changed to C comment

Please do not specify variables in code body.  This is a C++ ism and
only works on newer GCC compilers but not 2.9.5 or earlier.

+               SaUint32T sync_sequence_number = 0;

The sectionData in req_exec_ckpt_synchronize_state is wrong.  It should
be void sectionData[0] and at the end of the structure.  Then ensure
that when creating the syncstate msg, to allocate sizeof syncstate +
(dataSize + dataOffset).  Then address the section data as a normal
array.

Still alot of C++ style comments please axe em for c comments.

when specifying todo, use uppercase TODO instead of todo makes searching
easier if all todos are consistent

Use a memcpy when initializing the checkpoint section descriptor:
example:
// Configure checkpoint section

I dont like the overload of section data updates along with checkpoint
and refcount updates.  I believe these two things should be two seperate
messages.  Logically that makes more sense, instead of updating the
creationattributes and reference countson every message.  The refcount
and creation attributes message should only be sent once, while a
section update should be sent for each section.  It is ok to have
multiple messages for synchronization.  We have a big address space to
use, so might as well use it. :)

Also why do you need sync_msg_sequence_number?  All messages are in
agreed order, so the order you send them will be the order they arrive. 
There should be no need to do any sequencing beyond that.

Good work

Regards
-steve



> -----Original Message-----
> From: Steven Dake [mailto:sdake at mvista.com] 
> Sent: Tuesday, February 22, 2005 3:27 PM
> To: Bajpai, Muni [NGC:B670:EXCH]
> Cc: openais at lists.osdl.org; Smith, Kristen [NGC:B675:EXCH]
> Subject: RE: Patch II design review !!
> 
> 
> On Tue, 2005-02-22 at 14:08, Muni Bajpai wrote:
> > Thanks Steven,
> > 
> > On the last item you mentioned for multiple checkpoints per section.
> > The Design as it exists today is that a sync_message is sent out for
> > every section in a checkpoint for all checkpoints. So if a
> checkpoint 
> > has 2 sections then 2 sync's are sent out for that checkpoint.
> > 
> 
> ok this works
> 
> > Will start work on the remaining items.
> > 
> > Thanks
> > 
> Thanks Muni
> 
> regards
> -steve
> 
> > Muni
> > 
> > -----Original Message-----
> > From: Steven Dake [mailto:sdake at mvista.com]
> > Sent: Tuesday, February 22, 2005 1:58 PM
> > To: Bajpai, Muni [NGC:B670:EXCH]
> > Cc: openais at lists.osdl.org; Smith, Kristen [NGC:B675:EXCH]
> > Subject: Re: Patch II design review !!
> > 
> > On Mon, 2005-02-21 at 12:10, Muni Bajpai wrote:
> > > Hi Steven,
> > > 
> > > Please take a look at the patch II and advise on the design. I
> > haven't
> > > run this code yet and is in a "In Progress" state. I wanted to get
> > > feedback before I continued down that path.
> > > 
> > >  -  Does checkpoint close get called on all the remaining
> processors 
> > > when a processor fails ?
> > > 
> > 
> > No.
> > 
> > When a processor fails, the processors will receive a configuration 
> > change.  The reason we keep a list of the processor identifiers
> along 
> > with their checkpoint reference count is so that each processor can 
> > reduce the reference count the appropriate number of times.  This is
> > similiar to closing, except that no actual close message should be 
> > sent.
> > 
> > > Thanks
> > > 
> > > Muni
> > > 
> > > <<patchtemp.txt>>
> > > 
> > 
> > Patch review:
> > 
> > Great work Muni!  I think the design is solid thus far.  I have a
> few 
> > suggestions.
> > 
> > When using pointers, please use * instead of [].
> > 
> > ex:
> > +static void initialize_ckpt_refcount_array (struct ckpt_refcnt
> > ckpt_refcount[]) {
> > 
> > should be:
> > 
> > +static void initialize_ckpt_refcount_array (struct ckpt_refcnt
> > *ckpt_refcount) {
> > 
> > 
> > I know this doesn't seem to make alot of sense when looking at the 
> > code, but in the past we have agreed to use mostly linux kernel
> coding 
> > style for the executive code, and SA forum coding style for the 
> > libraries. This makes debugging libraries for people using the APIs 
> > easier and makes working on the executive easier for us 
> > non-capitalized developers
> > to read the code.
> > 
> > Unfortunately this group decision was made after the checkpoint code
> > was developed.  Could you change the style to match this?  A good 
> > example:
> > 
> > for (checkpointList = checkpointListHead.next;
> > +        checkpointList != &checkpointListHead;
> > +        checkpointList = checkpointList->next) {
> > should be
> > 
> > for (list = checkpoint_list_head.next;
> >         list != &checkpoint_list_head;
> >         list =list->next);
> > 
> > Please do not assign structures.  Use memcpy instead.  Example:
> > +                       sync_msg->request.previous_ring_id =
> > saved_ring_id;                     
> > 
> > should be
> > memcpy (&sync_msg->request.previous_ring_id, &saved_ring_id, sizeof 
> > (struct memb_ring_id));
> > 
> > totempg_token_callback_create should use
> TOTEMPG_CALLBACK_TOKEN_SENT.
> > If you queue data on received, then that data has to be queued
> before
> > the token can be processed, which introduces latency into the
> > protocol.
> > 
> > Please do not use C++ style comments.  evil evil.  I use them for 
> > TODO's which is acceptable but that is the only valid case.  We
> intend 
> > to delete all those TODOs someday :)  Example:
> > +       //Check for empty list here
> > This should be
> > /*
> >  * Check for empty list here
> >  */
> > 
> > In ckpt_recovery_finalize there is an extra list_init which is
> > unnecessary:
> > +       list_init(&checkpointListHead);
> > 
> > I don't really like the structurein
> ckpt_recovery_process_members_exit
> > but if that is the only way to do it, then thats the only way to do 
> > it..
> > 
> > synchronize_state needs some work..  Specifically I don't think you 
> > can just do a sectioncreate..  You want to actually synchronize the
> > section
> > data.  Allocating void *data is not optimal since memory allocation
> > can
> > fail.  Nothing in the recovery path should allocate memory if at all
> > possible.  When comparing ring ids, use memcmp instead of the if
> > statement you have.  ex:
> > +       if ((req_exec_ckpt_sync_state->previous_ring_id.seq !=
> > saved_ring_id.seq)
> > +               ||
> > (req_exec_ckpt_sync_state->previous_ring_id.rep.s_addr !=
> > saved_ring_id.rep.s_addr)) {
> > +               return(0);
> > +       }
> > 
> > should be
> > 
> > if (memcmp (&req_exec_ckpt_sync_Sate->previous_ring_id,
> > &saved_ring_id,
> > sizeof (struct memb_ring_id) != 0) {
> >         return (0);
> > }
> > 
> > What should probably happen is that a new function should be created
> > (ckpt_section_create) that creates the checkpoint section
> information 
> > based upon a saCkptCheckpointSection data structure.
> > 
> > This same thing should be done for the saCkptCHeckpoint data
> structure 
> > as well (in case the processor adding the checkpoint doesn't yet
> have 
> > it in its database).
> > 
> > I'd also suggest somehow adding more then one section to the 
> > synchronize message.  It is possible a checkpoint could have alot of
> > sections.
> > 
> > Your off to a great start Muni...
> > 
> > Good work
> > -steve
> > 
> > 
> 
> 
>  




More information about the Openais mailing list