<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">
<META NAME="Generator" CONTENT="MS Exchange Server version 5.5.2658.2">
<TITLE>RE: Patch II design review revision 1 !!</TITLE>
</HEAD>
<BODY>
<P><FONT SIZE=2>Thanks for spotting the big hole on the section data.</FONT>
</P>
<P><FONT SIZE=2>Thanks</FONT>
</P>
<P><FONT SIZE=2>Muni</FONT>
</P>
<P><FONT SIZE=2>-----Original Message-----</FONT>
<BR><FONT SIZE=2>From: Steven Dake [<A HREF="mailto:sdake@mvista.com">mailto:sdake@mvista.com</A>] </FONT>
<BR><FONT SIZE=2>Sent: Thursday, February 24, 2005 2:56 AM</FONT>
<BR><FONT SIZE=2>To: Bajpai, Muni [NGC:B670:EXCH]</FONT>
<BR><FONT SIZE=2>Cc: 'openais@lists.osdl.org'; Smith, Kristen [NGC:B675:EXCH]</FONT>
<BR><FONT SIZE=2>Subject: Re: Patch II design review revision 1 !!</FONT>
</P>
<BR>
<P><FONT SIZE=2>Muni,</FONT>
<BR><FONT SIZE=2>My apologies for not responding earlier; I've been swawmped by my day job.. :( On Wed, 2005-02-23 at 13:18, Muni Bajpai wrote:</FONT></P>
<P><FONT SIZE=2>> Hey Steve,</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> So I have made the changes as itemized by you.</FONT>
<BR><FONT SIZE=2>> Some Notes:</FONT>
<BR><FONT SIZE=2>> 1.) Have removed saCkptCheckpointSection from </FONT>
<BR><FONT SIZE=2>> req_exec_ckpt_synchronize_state and added the following</FONT>
<BR><FONT SIZE=2>> SaCkptSectionDescriptorT sectionDescriptor;</FONT>
<BR><FONT SIZE=2>> SaUint32T dataOffSet;</FONT>
<BR><FONT SIZE=2>> SaUint32T dataSize;</FONT>
</P>
<P><FONT SIZE=2>I think this makes sense, but probably as another message. See later comments.</FONT>
</P>
<P><FONT SIZE=2>> 2.) Could not think of any other way to do </FONT>
<BR><FONT SIZE=2>> ckpt_checkpoint_remove_cleanup. Let me know if you have any </FONT>
<BR><FONT SIZE=2>> suggestions</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>hmm I think its ok for now we can always refine later.</FONT>
</P>
<P><FONT SIZE=2>> Thanks,</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> Muni</FONT>
<BR><FONT SIZE=2>> P.S I haven't tested any of this and plan to do that as soon the I </FONT>
<BR><FONT SIZE=2>> base changes for recovery are in. When do you estimate that to be </FONT>
<BR><FONT SIZE=2>> merged in ?</FONT>
<BR><FONT SIZE=2>> </FONT>
</P>
<P><FONT SIZE=2>I am hopeful I will have something available Monday. I'd suggest, however, trying to test without the base recovery code by adding the calls to the locations specified in previous emails. This should atleast allow you to start testing your implementation..</FONT></P>
<P><FONT SIZE=2>comments on patch:</FONT>
<BR><FONT SIZE=2>findProcessorIndex style is still somewhat wacky and has a c++ comment</FONT>
</P>
<P><FONT SIZE=2>"//save off the elements" c++ comment should be changed to C comment</FONT>
</P>
<P><FONT SIZE=2>Please do not specify variables in code body. This is a C++ ism and only works on newer GCC compilers but not 2.9.5 or earlier.</FONT></P>
<P><FONT SIZE=2>+ SaUint32T sync_sequence_number = 0;</FONT>
</P>
<P><FONT SIZE=2>The sectionData in req_exec_ckpt_synchronize_state is wrong. It should be void sectionData[0] and at the end of the structure. Then ensure that when creating the syncstate msg, to allocate sizeof syncstate + (dataSize + dataOffset). Then address the section data as a normal array.</FONT></P>
<P><FONT SIZE=2>Still alot of C++ style comments please axe em for c comments.</FONT>
</P>
<P><FONT SIZE=2>when specifying todo, use uppercase TODO instead of todo makes searching easier if all todos are consistent</FONT>
</P>
<P><FONT SIZE=2>Use a memcpy when initializing the checkpoint section descriptor:</FONT>
<BR><FONT SIZE=2>example:</FONT>
<BR><FONT SIZE=2>// Configure checkpoint section</FONT>
</P>
<P><FONT SIZE=2>I dont like the overload of section data updates along with checkpoint and refcount updates. I believe these two things should be two seperate messages. Logically that makes more sense, instead of updating the creationattributes and reference countson every message. The refcount and creation attributes message should only be sent once, while a section update should be sent for each section. It is ok to have multiple messages for synchronization. We have a big address space to use, so might as well use it. :)</FONT></P>
<P><FONT SIZE=2>Also why do you need sync_msg_sequence_number? All messages are in agreed order, so the order you send them will be the order they arrive. </FONT></P>
<P><FONT SIZE=2>There should be no need to do any sequencing beyond that.</FONT>
</P>
<P><FONT SIZE=2>Good work</FONT>
</P>
<P><FONT SIZE=2>Regards</FONT>
<BR><FONT SIZE=2>-steve</FONT>
</P>
<BR>
<BR>
<P><FONT SIZE=2>> -----Original Message-----</FONT>
<BR><FONT SIZE=2>> From: Steven Dake [<A HREF="mailto:sdake@mvista.com">mailto:sdake@mvista.com</A>]</FONT>
<BR><FONT SIZE=2>> Sent: Tuesday, February 22, 2005 3:27 PM</FONT>
<BR><FONT SIZE=2>> To: Bajpai, Muni [NGC:B670:EXCH]</FONT>
<BR><FONT SIZE=2>> Cc: openais@lists.osdl.org; Smith, Kristen [NGC:B675:EXCH]</FONT>
<BR><FONT SIZE=2>> Subject: RE: Patch II design review !!</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> On Tue, 2005-02-22 at 14:08, Muni Bajpai wrote:</FONT>
<BR><FONT SIZE=2>> > Thanks Steven,</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > On the last item you mentioned for multiple checkpoints per section. </FONT>
<BR><FONT SIZE=2>> > The Design as it exists today is that a sync_message is sent out for </FONT>
<BR><FONT SIZE=2>> > every section in a checkpoint for all checkpoints. So if a</FONT>
<BR><FONT SIZE=2>> checkpoint</FONT>
<BR><FONT SIZE=2>> > has 2 sections then 2 sync's are sent out for that checkpoint.</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> ok this works</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> > Will start work on the remaining items.</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > Thanks</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> Thanks Muni</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> regards</FONT>
<BR><FONT SIZE=2>> -steve</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> > Muni</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > -----Original Message-----</FONT>
<BR><FONT SIZE=2>> > From: Steven Dake [<A HREF="mailto:sdake@mvista.com">mailto:sdake@mvista.com</A>]</FONT>
<BR><FONT SIZE=2>> > Sent: Tuesday, February 22, 2005 1:58 PM</FONT>
<BR><FONT SIZE=2>> > To: Bajpai, Muni [NGC:B670:EXCH]</FONT>
<BR><FONT SIZE=2>> > Cc: openais@lists.osdl.org; Smith, Kristen [NGC:B675:EXCH]</FONT>
<BR><FONT SIZE=2>> > Subject: Re: Patch II design review !!</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > On Mon, 2005-02-21 at 12:10, Muni Bajpai wrote:</FONT>
<BR><FONT SIZE=2>> > > Hi Steven,</FONT>
<BR><FONT SIZE=2>> > > </FONT>
<BR><FONT SIZE=2>> > > Please take a look at the patch II and advise on the design. I</FONT>
<BR><FONT SIZE=2>> > haven't</FONT>
<BR><FONT SIZE=2>> > > run this code yet and is in a "In Progress" state. I wanted to get </FONT>
<BR><FONT SIZE=2>> > > feedback before I continued down that path.</FONT>
<BR><FONT SIZE=2>> > > </FONT>
<BR><FONT SIZE=2>> > > - Does checkpoint close get called on all the remaining</FONT>
<BR><FONT SIZE=2>> processors</FONT>
<BR><FONT SIZE=2>> > > when a processor fails ?</FONT>
<BR><FONT SIZE=2>> > > </FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > No.</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > When a processor fails, the processors will receive a configuration</FONT>
<BR><FONT SIZE=2>> > change. The reason we keep a list of the processor identifiers</FONT>
<BR><FONT SIZE=2>> along</FONT>
<BR><FONT SIZE=2>> > with their checkpoint reference count is so that each processor can</FONT>
<BR><FONT SIZE=2>> > reduce the reference count the appropriate number of times. This is</FONT>
<BR><FONT SIZE=2>> > similiar to closing, except that no actual close message should be </FONT>
<BR><FONT SIZE=2>> > sent.</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > > Thanks</FONT>
<BR><FONT SIZE=2>> > > </FONT>
<BR><FONT SIZE=2>> > > Muni</FONT>
<BR><FONT SIZE=2>> > > </FONT>
<BR><FONT SIZE=2>> > > <<patchtemp.txt>></FONT>
<BR><FONT SIZE=2>> > > </FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > Patch review:</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > Great work Muni! I think the design is solid thus far. I have a</FONT>
<BR><FONT SIZE=2>> few</FONT>
<BR><FONT SIZE=2>> > suggestions.</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > When using pointers, please use * instead of [].</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > ex:</FONT>
<BR><FONT SIZE=2>> > +static void initialize_ckpt_refcount_array (struct ckpt_refcnt</FONT>
<BR><FONT SIZE=2>> > ckpt_refcount[]) {</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > should be:</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > +static void initialize_ckpt_refcount_array (struct ckpt_refcnt</FONT>
<BR><FONT SIZE=2>> > *ckpt_refcount) {</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > I know this doesn't seem to make alot of sense when looking at the</FONT>
<BR><FONT SIZE=2>> > code, but in the past we have agreed to use mostly linux kernel</FONT>
<BR><FONT SIZE=2>> coding</FONT>
<BR><FONT SIZE=2>> > style for the executive code, and SA forum coding style for the</FONT>
<BR><FONT SIZE=2>> > libraries. This makes debugging libraries for people using the APIs </FONT>
<BR><FONT SIZE=2>> > easier and makes working on the executive easier for us </FONT>
<BR><FONT SIZE=2>> > non-capitalized developers</FONT>
<BR><FONT SIZE=2>> > to read the code.</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > Unfortunately this group decision was made after the checkpoint code </FONT>
<BR><FONT SIZE=2>> > was developed. Could you change the style to match this? A good</FONT>
<BR><FONT SIZE=2>> > example:</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > for (checkpointList = checkpointListHead.next;</FONT>
<BR><FONT SIZE=2>> > + checkpointList != &checkpointListHead;</FONT>
<BR><FONT SIZE=2>> > + checkpointList = checkpointList->next) {</FONT>
<BR><FONT SIZE=2>> > should be</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > for (list = checkpoint_list_head.next;</FONT>
<BR><FONT SIZE=2>> > list != &checkpoint_list_head;</FONT>
<BR><FONT SIZE=2>> > list =list->next);</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > Please do not assign structures. Use memcpy instead. Example:</FONT>
<BR><FONT SIZE=2>> > + sync_msg->request.previous_ring_id =</FONT>
<BR><FONT SIZE=2>> > saved_ring_id; </FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > should be</FONT>
<BR><FONT SIZE=2>> > memcpy (&sync_msg->request.previous_ring_id, &saved_ring_id, sizeof</FONT>
<BR><FONT SIZE=2>> > (struct memb_ring_id));</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > totempg_token_callback_create should use</FONT>
<BR><FONT SIZE=2>> TOTEMPG_CALLBACK_TOKEN_SENT.</FONT>
<BR><FONT SIZE=2>> > If you queue data on received, then that data has to be queued</FONT>
<BR><FONT SIZE=2>> before</FONT>
<BR><FONT SIZE=2>> > the token can be processed, which introduces latency into the </FONT>
<BR><FONT SIZE=2>> > protocol.</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > Please do not use C++ style comments. evil evil. I use them for</FONT>
<BR><FONT SIZE=2>> > TODO's which is acceptable but that is the only valid case. We</FONT>
<BR><FONT SIZE=2>> intend</FONT>
<BR><FONT SIZE=2>> > to delete all those TODOs someday :) Example:</FONT>
<BR><FONT SIZE=2>> > + //Check for empty list here</FONT>
<BR><FONT SIZE=2>> > This should be</FONT>
<BR><FONT SIZE=2>> > /*</FONT>
<BR><FONT SIZE=2>> > * Check for empty list here</FONT>
<BR><FONT SIZE=2>> > */</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > In ckpt_recovery_finalize there is an extra list_init which is</FONT>
<BR><FONT SIZE=2>> > unnecessary:</FONT>
<BR><FONT SIZE=2>> > + list_init(&checkpointListHead);</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > I don't really like the structurein</FONT>
<BR><FONT SIZE=2>> ckpt_recovery_process_members_exit</FONT>
<BR><FONT SIZE=2>> > but if that is the only way to do it, then thats the only way to do</FONT>
<BR><FONT SIZE=2>> > it..</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > synchronize_state needs some work.. Specifically I don't think you</FONT>
<BR><FONT SIZE=2>> > can just do a sectioncreate.. You want to actually synchronize the</FONT>
<BR><FONT SIZE=2>> > section</FONT>
<BR><FONT SIZE=2>> > data. Allocating void *data is not optimal since memory allocation</FONT>
<BR><FONT SIZE=2>> > can</FONT>
<BR><FONT SIZE=2>> > fail. Nothing in the recovery path should allocate memory if at all</FONT>
<BR><FONT SIZE=2>> > possible. When comparing ring ids, use memcmp instead of the if</FONT>
<BR><FONT SIZE=2>> > statement you have. ex:</FONT>
<BR><FONT SIZE=2>> > + if ((req_exec_ckpt_sync_state->previous_ring_id.seq !=</FONT>
<BR><FONT SIZE=2>> > saved_ring_id.seq)</FONT>
<BR><FONT SIZE=2>> > + ||</FONT>
<BR><FONT SIZE=2>> > (req_exec_ckpt_sync_state->previous_ring_id.rep.s_addr !=</FONT>
<BR><FONT SIZE=2>> > saved_ring_id.rep.s_addr)) {</FONT>
<BR><FONT SIZE=2>> > + return(0);</FONT>
<BR><FONT SIZE=2>> > + }</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > should be</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > if (memcmp (&req_exec_ckpt_sync_Sate->previous_ring_id,</FONT>
<BR><FONT SIZE=2>> > &saved_ring_id,</FONT>
<BR><FONT SIZE=2>> > sizeof (struct memb_ring_id) != 0) {</FONT>
<BR><FONT SIZE=2>> > return (0);</FONT>
<BR><FONT SIZE=2>> > }</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > What should probably happen is that a new function should be created</FONT>
<BR><FONT SIZE=2>> > (ckpt_section_create) that creates the checkpoint section</FONT>
<BR><FONT SIZE=2>> information</FONT>
<BR><FONT SIZE=2>> > based upon a saCkptCheckpointSection data structure.</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > This same thing should be done for the saCkptCHeckpoint data</FONT>
<BR><FONT SIZE=2>> structure</FONT>
<BR><FONT SIZE=2>> > as well (in case the processor adding the checkpoint doesn't yet</FONT>
<BR><FONT SIZE=2>> have</FONT>
<BR><FONT SIZE=2>> > it in its database).</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > I'd also suggest somehow adding more then one section to the</FONT>
<BR><FONT SIZE=2>> > synchronize message. It is possible a checkpoint could have alot of</FONT>
<BR><FONT SIZE=2>> > sections.</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > Your off to a great start Muni...</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > Good work</FONT>
<BR><FONT SIZE=2>> > -steve</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> </FONT>
</P>
<BR>
</BODY>
</HTML>