[Openais] RE: Patch II design review !!

Steven Dake sdake at mvista.com
Tue Feb 22 13:27:14 PST 2005


On Tue, 2005-02-22 at 14:08, Muni Bajpai wrote:
> Thanks Steven,
> 
> On the last item you mentioned for multiple checkpoints per section.
> The Design as it exists today is that a sync_message is sent out for
> every section in a checkpoint for all checkpoints. So if a checkpoint
> has 2 sections then 2 sync's are sent out for that checkpoint.
> 

ok this works

> Will start work on the remaining items.
> 
> Thanks
> 
Thanks Muni

regards
-steve

> Muni
> 
> -----Original Message-----
> From: Steven Dake [mailto:sdake at mvista.com] 
> Sent: Tuesday, February 22, 2005 1:58 PM
> To: Bajpai, Muni [NGC:B670:EXCH]
> Cc: openais at lists.osdl.org; Smith, Kristen [NGC:B675:EXCH]
> Subject: Re: Patch II design review !!
> 
> On Mon, 2005-02-21 at 12:10, Muni Bajpai wrote:
> > Hi Steven,
> > 
> > Please take a look at the patch II and advise on the design. I
> haven't
> > run this code yet and is in a "In Progress" state. I wanted to get
> > feedback before I continued down that path.
> > 
> >  -  Does checkpoint close get called on all the remaining processors
> > when a processor fails ?
> > 
> 
> No.
> 
> When a processor fails, the processors will receive a configuration
> change.  The reason we keep a list of the processor identifiers along
> with their checkpoint reference count is so that each processor can
> reduce the reference count the appropriate number of times.  This is
> similiar to closing, except that no actual close message should be
> sent.
> 
> > Thanks
> > 
> > Muni
> > 
> > <<patchtemp.txt>> 
> > 
> 
> Patch review:
> 
> Great work Muni!  I think the design is solid thus far.  I have a few
> suggestions.
> 
> When using pointers, please use * instead of [].
> 
> ex:
> +static void initialize_ckpt_refcount_array (struct ckpt_refcnt
> ckpt_refcount[]) {
> 
> should be:
> 
> +static void initialize_ckpt_refcount_array (struct ckpt_refcnt
> *ckpt_refcount) {
> 
> 
> I know this doesn't seem to make alot of sense when looking at the
> code,
> but in the past we have agreed to use mostly linux kernel coding style
> for the executive code, and SA forum coding style for the libraries. 
> This makes debugging libraries for people using the APIs easier and
> makes working on the executive easier for us non-capitalized
> developers
> to read the code.
> 
> Unfortunately this group decision was made after the checkpoint code
> was
> developed.  Could you change the style to match this?  A good example:
> 
> for (checkpointList = checkpointListHead.next;
> +        checkpointList != &checkpointListHead;
> +        checkpointList = checkpointList->next) {
> should be
> 
> for (list = checkpoint_list_head.next;
>         list != &checkpoint_list_head;
>         list =list->next);
> 
> Please do not assign structures.  Use memcpy instead.  Example:
> +                       sync_msg->request.previous_ring_id =
> saved_ring_id;                     
> 
> should be
> memcpy (&sync_msg->request.previous_ring_id, &saved_ring_id, sizeof
> (struct memb_ring_id));
> 
> totempg_token_callback_create should use TOTEMPG_CALLBACK_TOKEN_SENT. 
> If you queue data on received, then that data has to be queued before
> the token can be processed, which introduces latency into the
> protocol.
> 
> Please do not use C++ style comments.  evil evil.  I use them for
> TODO's
> which is acceptable but that is the only valid case.  We intend to
> delete all those TODOs someday :)  Example:
> +       //Check for empty list here
> This should be
> /*
>  * Check for empty list here
>  */
> 
> In ckpt_recovery_finalize there is an extra list_init which is
> unnecessary:
> +       list_init(&checkpointListHead);
> 
> I don't really like the structurein ckpt_recovery_process_members_exit
> but if that is the only way to do it, then thats the only way to do
> it..
> 
> synchronize_state needs some work..  Specifically I don't think you
> can
> just do a sectioncreate..  You want to actually synchronize the
> section
> data.  Allocating void *data is not optimal since memory allocation
> can
> fail.  Nothing in the recovery path should allocate memory if at all
> possible.  When comparing ring ids, use memcmp instead of the if
> statement you have.  ex:
> +       if ((req_exec_ckpt_sync_state->previous_ring_id.seq !=
> saved_ring_id.seq)
> +               ||
> (req_exec_ckpt_sync_state->previous_ring_id.rep.s_addr !=
> saved_ring_id.rep.s_addr)) {
> +               return(0);
> +       }
> 
> should be
> 
> if (memcmp (&req_exec_ckpt_sync_Sate->previous_ring_id,
> &saved_ring_id,
> sizeof (struct memb_ring_id) != 0) {
>         return (0);
> }
> 
> What should probably happen is that a new function should be created
> (ckpt_section_create) that creates the checkpoint section information
> based upon a saCkptCheckpointSection data structure.
> 
> This same thing should be done for the saCkptCHeckpoint data structure
> as well (in case the processor adding the checkpoint doesn't yet have
> it
> in its database).
> 
> I'd also suggest somehow adding more then one section to the
> synchronize
> message.  It is possible a checkpoint could have alot of sections.
> 
> Your off to a great start Muni...
> 
> Good work
> -steve
> 
> 




More information about the Openais mailing list