[Openais] flow control and merge recovery
sdake at mvista.com
Thu Sep 23 16:32:31 PDT 2004
On Thu, 2004-09-23 at 16:10, Mark Haverkamp wrote:
> On Thu, 2004-09-23 at 15:53, Steven Dake wrote:
> > On Thu, 2004-09-23 at 15:29, Mark Haverkamp wrote:
> > > On Wed, 2004-09-22 at 14:04, Steven Dake wrote:
> > > > On Wed, 2004-09-22 at 13:47, Mark Haverkamp wrote:
> > >
> > > > >
> > > > > So, would I call gmi_send_ok()/gmi_mcast() until I could send no more?
> > > > > Couldn't I still do that without the token callback?
> > > > >
> > > >
> > > > Yup it could be done without the token callback. The advantage of the
> > > > token callback is that it will specify when it makes sense to do
> > > > gmi-send_ok/gmi_mcast again to finish the recovery.
> > >
> > > Are you working on this callback code now? Also, looking at gmi_mcast,
> > > it seems to either return success (0) or assert. Would it be reasonable
> > > to have gmi_mcast call gmi_send_ok, and return status if it can't queue
> > > the message?
> > >
> > I have not started the callback code.. It should be pretty simple to
> > add.. If you want to use this mechanism, I'll add it in. Ideas on the
> > interface? Something like:
> > gmi_fc_open_create (void *handle, int (*callback_fn), void *data);
> > gmi_fc_open_destroy (void *handle);
> > (flow control opened register/unregister)
> This sounds useful, one thing that I have been concerned about with my
> current approach is that if for some reason I can't do a gmi_mcast, I
> won't have a way to try later since I use the receipt of the message
> that I send to trigger the next one. (I haven't seen this yet though).
> How does the function calling my callback code know that it can handle
> the mcast that I want to do? Do I need to call gmi_send_ok first and
> just return 0 if I can't send my message?
We can have gmi_mcast return -1 if it couldn't send the message (because
the buffer was full or some other reason) and do the assertion changes
throughout the rest of the gmi_mcast callers.
> > if callback_fn returns -1, no more callbacks will be called. This would
> > indicate the outgoing queues are full. If callback_fn returns 0, more
> > callbacks would be called until all have been called or -1 is returned.
> > If a new configuration change comes while fc_open_destroy is pending,
> > call destroy to start the recovery over (with a new data element).
> I'm not sure that I understand this. Does this mean that if a new
> configuration change happens while I have an active call back, that I
> destroy the current one and create a new one?
It depends on how you want to do it.. I've been thinking of this for
checkpointing, and I think I want to destroy whatever context I pass in
data and start fresh with a new context. But this may not be an issue
depending on how its implemented.
> > We can make gmi_mcast not assert, but we have to be careful. The assert
> > is in there to catch bugs.. If a caller calls gmi_mcast, and the
> > message can't be queued, then in every case in the current openais that
> > is a serious bug. This shouldn't happen today with the flow control
> > code (which is why there is an assert there).
> As long as the mcast is done from the library interface side.
> > If we change the semantics of gmi_mcast, by allowing it to fail to queue
> > without asserting, we should be careful to either handle return values
> > (in the case of recovery) or assert where a -1 return value shouldn't
> > happen.
> > So most of the gmi_mcast calls would be somethign like:
> > res = gmi_mcast (...)
> > assert (res == 0);
> > for all of the services, except in the case where a res of -1 can be
> > handled (such as merge recovery).
I'll work on the token rotation callback. Can you work up the patch for
the gmi_send_ok/gmi_mcast/assert changes?
More information about the Openais