[Openais] status of code in bk?

Steven Dake sdake at mvista.com
Wed Feb 9 17:11:33 PST 2005


Kristen
I'd suggest playing with the timing and reporting the lowest values
which work for you.  I intend to spend some time on determining this but
its low priority for now.  I'd expect that the following aggressive
values should work in a LAN setting.  If they dont, try increasing
(scaling all values by the same multiplier).

TIMEOUT_STATE_GATHER_JOIN 40
TIMEOUT_STATE_GATHER_CONSENSUS 80 (should be double join)
TIMEOUT_TOKEN 90
TIMEOUT_TOKEN_RETRANSMIT 30

You may be able to get TIMEOUT_TOKEN down to 60 with more chance of
reconfigurations.

There was no intent to change the timing values.  I must have made the
change during debugging.  I often change these values to test for
different timeout values and may have inadvertantly committed that
change.

When calculating the timeout for the token, I find that a token should
spend about 300 usec at each processor if there are no messages to
multicast.  With 16 processors, that is about 2 msec.  If the token
doesn't rotate in TIMEOUT_TOKEN a reconfiguration occurs.  If you add
one processor multicasting 40 messages per ring rotation, a token may
take 5-6 msec to rotate.  Given that, 90 msec is sufficient to wait for
a token loss detector.

I eventually intend to make the calculation of the ring timeouts
dynamically calculated during ring formation but this work is quite a
bit out (maybe even next year).

Thanks
-steve

On Wed, 2005-02-09 at 17:41, Kristen Smith wrote:
> Steve,
> 
> One thing I notice when running the latest bitkeeper code is that the
> time it takes to notice that another node has failed has increased. If
> I start up 2 aisexecs (one on each node) and then ctrl-c one of them,
> the other takes a few seconds to notice that the node went away. When
> we started using the totem-ais code in Jan, I was impressed that the
> time to notice the failure was decreased (almost instaneous) than it
> had been with the previous openais, but now it seems like it is slower
> than with the previous openais (before the totem changes).
> 
> Are there new configuration parms that I need to muck with to get the
> node failure detection time down? (I did see your email a while back
> on decreasing this time, I was just wondering if you had intended to
> make the detection time greater in this new code).
> 
> Thanks,
> Kristen
> 
> -----Original Message-----
> From: Steven Dake [mailto:sdake at mvista.com] 
> Sent: Tuesday, February 08, 2005 3:29 PM
> To: Smith, Kristen [NGC:B675:EXCH]
> Cc: openais at lists.osdl.org; Bajpai, Muni [NGC:B670:EXCH]
> Subject: Re: [Openais] status of code in bk?
> 
> 
> Kristen,
> 
> All of the code is now in bitkeeper.  I'll try to wrap up a freshmeat
> release tomorrow with code coverage reports after running the tests we
> have available.
> 
> Thanks
> -steve
> 
> On Tue, 2005-02-08 at 07:30, Kristen Smith wrote:
> > Hello,
> > 
> > Could you please tell me the status of the latest code that is in 
> > bitkeeper? Does it have all the patches you guys have been putting
> out 
> > for the past few weeks? If not, when do you foresee updating it with
> > all these patches?
> > 
> > Thanks,
> > Kristen
> > 
> > 
> > 
> >
> ______________________________________________________________________
> > _______________________________________________
> > Openais mailing list
> > Openais at lists.osdl.org
> http://lists.osdl.org/mailman/listinfo/openais
> 
> 




More information about the Openais mailing list