[Openais] Re: Patch II design review !!

Steven Dake sdake at mvista.com
Tue Feb 22 11:57:58 PST 2005


On Mon, 2005-02-21 at 12:10, Muni Bajpai wrote:
> Hi Steven,
> 
> Please take a look at the patch II and advise on the design. I haven't
> run this code yet and is in a "In Progress" state. I wanted to get
> feedback before I continued down that path.
> 
>  -  Does checkpoint close get called on all the remaining processors
> when a processor fails ?
> 

No.

When a processor fails, the processors will receive a configuration
change.  The reason we keep a list of the processor identifiers along
with their checkpoint reference count is so that each processor can
reduce the reference count the appropriate number of times.  This is
similiar to closing, except that no actual close message should be sent.

> Thanks
> 
> Muni
> 
> <<patchtemp.txt>> 
> 

Patch review:

Great work Muni!  I think the design is solid thus far.  I have a few
suggestions.

When using pointers, please use * instead of [].

ex:
+static void initialize_ckpt_refcount_array (struct ckpt_refcnt
ckpt_refcount[]) {

should be:

+static void initialize_ckpt_refcount_array (struct ckpt_refcnt
*ckpt_refcount) {


I know this doesn't seem to make alot of sense when looking at the code,
but in the past we have agreed to use mostly linux kernel coding style
for the executive code, and SA forum coding style for the libraries. 
This makes debugging libraries for people using the APIs easier and
makes working on the executive easier for us non-capitalized developers
to read the code.

Unfortunately this group decision was made after the checkpoint code was
developed.  Could you change the style to match this?  A good example:

for (checkpointList = checkpointListHead.next;
+        checkpointList != &checkpointListHead;
+        checkpointList = checkpointList->next) {
should be

for (list = checkpoint_list_head.next;
	list != &checkpoint_list_head;
	list =list->next);

Please do not assign structures.  Use memcpy instead.  Example:
+                       sync_msg->request.previous_ring_id =
saved_ring_id;                     

should be
memcpy (&sync_msg->request.previous_ring_id, &saved_ring_id, sizeof
(struct memb_ring_id));

totempg_token_callback_create should use TOTEMPG_CALLBACK_TOKEN_SENT. 
If you queue data on received, then that data has to be queued before
the token can be processed, which introduces latency into the protocol.

Please do not use C++ style comments.  evil evil.  I use them for TODO's
which is acceptable but that is the only valid case.  We intend to
delete all those TODOs someday :)  Example:
+       //Check for empty list here
This should be
/*
 * Check for empty list here
 */

In ckpt_recovery_finalize there is an extra list_init which is
unnecessary:
+       list_init(&checkpointListHead);

I don't really like the structurein ckpt_recovery_process_members_exit
but if that is the only way to do it, then thats the only way to do it..

synchronize_state needs some work..  Specifically I don't think you can
just do a sectioncreate..  You want to actually synchronize the section
data.  Allocating void *data is not optimal since memory allocation can
fail.  Nothing in the recovery path should allocate memory if at all
possible.  When comparing ring ids, use memcmp instead of the if
statement you have.  ex:
+       if ((req_exec_ckpt_sync_state->previous_ring_id.seq !=
saved_ring_id.seq)
+               ||
(req_exec_ckpt_sync_state->previous_ring_id.rep.s_addr !=
saved_ring_id.rep.s_addr)) {
+               return(0);
+       }

should be

if (memcmp (&req_exec_ckpt_sync_Sate->previous_ring_id, &saved_ring_id,
sizeof (struct memb_ring_id) != 0) {
	return (0);
}

What should probably happen is that a new function should be created
(ckpt_section_create) that creates the checkpoint section information
based upon a saCkptCheckpointSection data structure.

This same thing should be done for the saCkptCHeckpoint data structure
as well (in case the processor adding the checkpoint doesn't yet have it
in its database).

I'd also suggest somehow adding more then one section to the synchronize
message.  It is possible a checkpoint could have alot of sections.

Your off to a great start Muni...

Good work
-steve




More information about the Openais mailing list