[Openais] PATCH: aisexec leak & corruption (whitetank &
trunk)(retry)
Muni Bajpai
muni.osdl at gmail.com
Tue Aug 29 08:24:26 PDT 2006
-- INLINE --
----- Original Message -----
From: "Fabien THOMAS" <fabien.thomas at netasq.com>
To: "Muni Bajpai" <muni.osdl at gmail.com>
Cc: <openais at lists.osdl.org>; "Muni Bajpai" <muniba at nortel.com>
Sent: Tuesday, August 29, 2006 10:01 AM
Subject: Re: [Openais] PATCH: aisexec leak & corruption (whitetank &
trunk)(retry)
It is a very simple scenario to reproduce the problem:
on first node run aisexec.
on the same node run a client that do:
init
open
iterate
close
finalize
on another node run aisexec:
recovery process is launched on the first node (that have all the
information locally)
=> internal structure is corrupted if the recovery happened during
the iterate process
what is really strange to me is why do we need to create new
checkpoint on a node that is the origin of the information?
why it is not possible to link the new checkpoint structure to
current iterators by name ?
>The reason why we recreate the checkpoints even on the point of origin of
>the checkpoint is that there is no way of predicting the network wide >view
>of the checkpoints post config. So lets say that if someone adds a section
>that your node is not aware of then we have a disconnect. We could >have
>just realloc'd the existing memory but we decided it to be cleaner/faster
>to create fresh structures on recovery.
>The real problem here is the fact that we are saving off references to the
>section instead of reevaluating it on every iteration like all the other
>calls. I >didnt write that code but I suspect that was in the interest of
>speed to not have to make the call to the handledb etc every time.
>So it needs to be reworked to where either we sacrifice the speed and
>reeval the refs as all the other calls do or force the iterators to reinit
>as theoretically you should not continue to iterate a list that has
>changed.
sorry if its a dumb question but i'm not aware of all the operations
done in aisexec.
Le 29 août 06 à 16:53, Muni Bajpai a écrit :
> So iteration at this point cannot survive a recovery process because the
> iterator has references to the sections which are not valid once recovery
> happens. Now recovery has no clue of these references and hence cannot
> update them.
>
> What I propose is that we re - initialize the iterator after a recovery
> to update the references. What this means is that any current iteration
> would have to be restarted. I might be able to get around that but that
> is the worst case scenario.
>
> - Muni
>
> - Muni----- Original Message ----- From: "Fabien THOMAS"
> <fabien.thomas at netasq.com>
> To: <openais at lists.osdl.org>
> Cc: "Muni Bajpai" <muniba at nortel.com>
> Sent: Tuesday, August 29, 2006 5:18 AM
> Subject: Re: [Openais] PATCH: aisexec leak & corruption (whitetank &
> trunk)(retry)
>
>
>> There is one remaining problem but i cannot find the reason:
>> during checkpoint recovery it seems that the structure is corrupted
>> (the full log is attached to my previous post).
>>
>
> Maybe i've an idea here:
>
> iteration_entry contain pointer to checkpoint section and it seems
> that the recovery process does not update the pointer to the new
> section list address.
> can you confirm ?
>
>
> fabien
> _______________________________________________
> Openais mailing list
> Openais at lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/openais
>
More information about the Openais
mailing list