[Openais] flow control and merge recovery

Thu Sep 23 16:10:13 PDT 2004

On Thu, 2004-09-23 at 15:53, Steven Dake wrote:
> On Thu, 2004-09-23 at 15:29, Mark Haverkamp wrote:
> > On Wed, 2004-09-22 at 14:04, Steven Dake wrote:
> > > On Wed, 2004-09-22 at 13:47, Mark Haverkamp wrote:
> > 
> > > > 
> > > > So, would I call gmi_send_ok()/gmi_mcast() until I could send no more?
> > > > Couldn't I still do that without the token callback?
> > > > 
> > > 
> > > Yup it could be done without the token callback.  The advantage of the
> > > token callback is that it will specify when it makes sense to do
> > > gmi-send_ok/gmi_mcast again to finish the recovery.
> > 
> > Are you working on this callback code now?  Also, looking at gmi_mcast,
> > it seems to either return success (0) or assert.  Would it be reasonable
> > to have gmi_mcast call gmi_send_ok, and return status if it can't queue
> > the message?
> > 
> 
> I have not started the callback code..  It should be pretty simple to
> add..  If you want to use this mechanism, I'll add it in.  Ideas on the
> interface?  Something like:
> 
> gmi_fc_open_create (void *handle, int (*callback_fn), void *data);
> gmi_fc_open_destroy (void *handle);
> (flow control opened register/unregister)

This sounds useful, one thing that I have been concerned about with my
current approach is that if for some reason I can't do a gmi_mcast, I
won't have a way to try later since I use the receipt of the message
that I send to trigger the next one.  (I haven't seen this yet though).

How does the function calling my callback code know that it can handle
the mcast that I want to do?  Do I need to call gmi_send_ok first and
just return 0 if I can't send my message?

> 
> if callback_fn returns -1, no more callbacks will be called.  This would
> indicate the outgoing queues are full.  If callback_fn returns 0, more
> callbacks would be called until all have been called or -1 is returned.
> 
> If a new configuration change comes while fc_open_destroy is pending,
> call destroy to start the recovery over (with a new data element).

I'm not sure that I understand this.  Does this mean that if a new
configuration change happens while I have an active call back, that I
destroy the current one and create a new one?

> 
> We can make gmi_mcast not assert, but we have to be careful.  The assert
> is in there to catch bugs..  If a caller calls gmi_mcast, and the
> message can't be queued, then in every case in the current openais that
> is a serious bug.  This shouldn't happen today with the flow control
> code (which is why there is an assert there).

As long as the mcast is done from the library interface side.

> 
> If we change the semantics of gmi_mcast, by allowing it to fail to queue
> without asserting, we should be careful to either handle return values
> (in the case of recovery) or assert where a -1 return value shouldn't
> happen.
> 
> So most of the gmi_mcast calls would be somethign like:
> 
> res = gmi_mcast (...)
> assert (res == 0);
> 
> for all of the services, except in the case where a res of -1 can be
> handled (such as merge recovery).

Yes.

Mark.

-- 
Mark Haverkamp <markh at osdl.org>