[Openais] RE: Patch II design review !!
Steven Dake
sdake at mvista.com
Tue Feb 22 13:27:14 PST 2005
On Tue, 2005-02-22 at 14:08, Muni Bajpai wrote:
> Thanks Steven,
>
> On the last item you mentioned for multiple checkpoints per section.
> The Design as it exists today is that a sync_message is sent out for
> every section in a checkpoint for all checkpoints. So if a checkpoint
> has 2 sections then 2 sync's are sent out for that checkpoint.
>
ok this works
> Will start work on the remaining items.
>
> Thanks
>
Thanks Muni
regards
-steve
> Muni
>
> -----Original Message-----
> From: Steven Dake [mailto:sdake at mvista.com]
> Sent: Tuesday, February 22, 2005 1:58 PM
> To: Bajpai, Muni [NGC:B670:EXCH]
> Cc: openais at lists.osdl.org; Smith, Kristen [NGC:B675:EXCH]
> Subject: Re: Patch II design review !!
>
> On Mon, 2005-02-21 at 12:10, Muni Bajpai wrote:
> > Hi Steven,
> >
> > Please take a look at the patch II and advise on the design. I
> haven't
> > run this code yet and is in a "In Progress" state. I wanted to get
> > feedback before I continued down that path.
> >
> > - Does checkpoint close get called on all the remaining processors
> > when a processor fails ?
> >
>
> No.
>
> When a processor fails, the processors will receive a configuration
> change. The reason we keep a list of the processor identifiers along
> with their checkpoint reference count is so that each processor can
> reduce the reference count the appropriate number of times. This is
> similiar to closing, except that no actual close message should be
> sent.
>
> > Thanks
> >
> > Muni
> >
> > <<patchtemp.txt>>
> >
>
> Patch review:
>
> Great work Muni! I think the design is solid thus far. I have a few
> suggestions.
>
> When using pointers, please use * instead of [].
>
> ex:
> +static void initialize_ckpt_refcount_array (struct ckpt_refcnt
> ckpt_refcount[]) {
>
> should be:
>
> +static void initialize_ckpt_refcount_array (struct ckpt_refcnt
> *ckpt_refcount) {
>
>
> I know this doesn't seem to make alot of sense when looking at the
> code,
> but in the past we have agreed to use mostly linux kernel coding style
> for the executive code, and SA forum coding style for the libraries.
> This makes debugging libraries for people using the APIs easier and
> makes working on the executive easier for us non-capitalized
> developers
> to read the code.
>
> Unfortunately this group decision was made after the checkpoint code
> was
> developed. Could you change the style to match this? A good example:
>
> for (checkpointList = checkpointListHead.next;
> + checkpointList != &checkpointListHead;
> + checkpointList = checkpointList->next) {
> should be
>
> for (list = checkpoint_list_head.next;
> list != &checkpoint_list_head;
> list =list->next);
>
> Please do not assign structures. Use memcpy instead. Example:
> + sync_msg->request.previous_ring_id =
> saved_ring_id;
>
> should be
> memcpy (&sync_msg->request.previous_ring_id, &saved_ring_id, sizeof
> (struct memb_ring_id));
>
> totempg_token_callback_create should use TOTEMPG_CALLBACK_TOKEN_SENT.
> If you queue data on received, then that data has to be queued before
> the token can be processed, which introduces latency into the
> protocol.
>
> Please do not use C++ style comments. evil evil. I use them for
> TODO's
> which is acceptable but that is the only valid case. We intend to
> delete all those TODOs someday :) Example:
> + //Check for empty list here
> This should be
> /*
> * Check for empty list here
> */
>
> In ckpt_recovery_finalize there is an extra list_init which is
> unnecessary:
> + list_init(&checkpointListHead);
>
> I don't really like the structurein ckpt_recovery_process_members_exit
> but if that is the only way to do it, then thats the only way to do
> it..
>
> synchronize_state needs some work.. Specifically I don't think you
> can
> just do a sectioncreate.. You want to actually synchronize the
> section
> data. Allocating void *data is not optimal since memory allocation
> can
> fail. Nothing in the recovery path should allocate memory if at all
> possible. When comparing ring ids, use memcmp instead of the if
> statement you have. ex:
> + if ((req_exec_ckpt_sync_state->previous_ring_id.seq !=
> saved_ring_id.seq)
> + ||
> (req_exec_ckpt_sync_state->previous_ring_id.rep.s_addr !=
> saved_ring_id.rep.s_addr)) {
> + return(0);
> + }
>
> should be
>
> if (memcmp (&req_exec_ckpt_sync_Sate->previous_ring_id,
> &saved_ring_id,
> sizeof (struct memb_ring_id) != 0) {
> return (0);
> }
>
> What should probably happen is that a new function should be created
> (ckpt_section_create) that creates the checkpoint section information
> based upon a saCkptCheckpointSection data structure.
>
> This same thing should be done for the saCkptCHeckpoint data structure
> as well (in case the processor adding the checkpoint doesn't yet have
> it
> in its database).
>
> I'd also suggest somehow adding more then one section to the
> synchronize
> message. It is possible a checkpoint could have alot of sections.
>
> Your off to a great start Muni...
>
> Good work
> -steve
>
>
More information about the Openais
mailing list