[Openais] PATCH: aisexec leak & corruption (whitetank & trunk)(retry)

Muni Bajpai muni.osdl at gmail.com
Tue Aug 29 08:24:26 PDT 2006


-- INLINE --
----- Original Message ----- 
From: "Fabien THOMAS" <fabien.thomas at netasq.com>
To: "Muni Bajpai" <muni.osdl at gmail.com>
Cc: <openais at lists.osdl.org>; "Muni Bajpai" <muniba at nortel.com>
Sent: Tuesday, August 29, 2006 10:01 AM
Subject: Re: [Openais] PATCH: aisexec leak & corruption (whitetank & 
trunk)(retry)


It is a very simple scenario to reproduce the problem:

on first node run aisexec.
on the same node run a client that do:
init
open
iterate
close
finalize

on another node run aisexec:
recovery process is launched on the first node (that have all the
information locally)
=> internal structure is corrupted if the recovery happened during
the iterate process

what is really strange to me is why do we need to create new
checkpoint on a node that is the origin of the information?
why it is not possible to link the new checkpoint structure  to
current iterators by name ?

>The reason why we recreate the checkpoints even on the point of origin of 
>the checkpoint is that there is no way of predicting the network wide >view 
>of the checkpoints post config. So lets say that if someone adds a section 
>that your node is not aware of then we have a disconnect. We could >have 
>just realloc'd the existing memory but we decided it to be cleaner/faster 
>to create fresh structures on recovery.

>The real problem here is the fact that we are saving off references to the 
>section instead of reevaluating it on every iteration like all the other 
>calls. I >didnt write that code but I suspect that was in the interest of 
>speed to not have to make the call to the handledb etc every time.

>So it needs to be reworked to where either we sacrifice the speed and 
>reeval the refs as all the other calls do or force the iterators to reinit 
>as theoretically you should not continue to iterate a list that has 
>changed.

sorry if its a dumb question but i'm not aware of all the operations
done in aisexec.

Le 29 août 06 à 16:53, Muni Bajpai a écrit :

> So iteration at this point cannot survive a recovery process  because the 
> iterator has references to the sections which are not  valid once recovery 
> happens. Now recovery has no clue of these  references and hence cannot 
> update them.
>
> What I propose is that we re - initialize the iterator after a  recovery 
> to update the references. What this means is that any  current iteration 
> would have to be restarted. I might be able to  get around that but that 
> is the worst case scenario.
>
> - Muni
>
> - Muni----- Original Message ----- From: "Fabien THOMAS" 
> <fabien.thomas at netasq.com>
> To: <openais at lists.osdl.org>
> Cc: "Muni Bajpai" <muniba at nortel.com>
> Sent: Tuesday, August 29, 2006 5:18 AM
> Subject: Re: [Openais] PATCH: aisexec leak & corruption (whitetank  & 
> trunk)(retry)
>
>
>> There is one remaining problem but i cannot find the reason:
>> during checkpoint recovery it seems that the structure is corrupted
>> (the full log is attached to my previous post).
>>
>
> Maybe i've an idea here:
>
> iteration_entry contain pointer to checkpoint section and it seems
> that the recovery process does not update the pointer to the new
> section list address.
> can you confirm ?
>
>
> fabien
> _______________________________________________
> Openais mailing list
> Openais at lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/openais
>




More information about the Openais mailing list