[Openais] defect 206 - base recovery code

Steven Dake sdake at mvista.com
Mon Feb 28 16:14:39 PST 2005


Attached is a patch for the base recovery code.  It works by requiring 4
new APIs in the service handler structure.  The four new APIs are

sync_init
	initialize synchronization state
sync_process
	execute synchronization by sending messages - returns 0 when complete
sync_activate
	activate the state contained in this run of the synchronization
sync_abort
	abort this round of syncrhonization

At the start of a service recovery, there is a barrier.  The barrier
works by each processor sending out a barrier request.  Once a processor
has received a barrier request from all processors in the configuration,
it knows the barrier is complete.  Once the barrier is complete, the
processor begins the synchronization.  Once synchronization is complete
(sync_process returns 0), the next service is selected and the process
is repeated.

Synchronization is ordered in the order that entries appear in the
handler list.  So CLM is first, EVS second, etc.  EVS will not begin
until CLM has completed because EVS barriers its start.

Mark, this will solve your problem where clm data wasn't available
before the evt service started recovery.  Unfortunately this will create
some extra work to change the synchronization mechanism...

Muni, your code should go pretty easily into this new infrastructure
since we designed the ckpt synchronization with this in mind.

I have done some testing with 4 processors and things seem to work well
with one service (CLM).  Barrier completions always happen at the
correct time.  Not sure about more then one service but the logic should
be the same.

Thanks and comnments welcome
-steve


-------------- next part --------------
A non-text attachment was scrubbed...
Name: defect-206.patch
Type: text/x-patch
Size: 22168 bytes
Desc: not available
Url : http://lists.linux-foundation.org/pipermail/openais/attachments/20050228/51871023/defect-206-0001.bin


More information about the Openais mailing list