[Openais] status of code in bk?

Kristen Smith kjsmith at nortel.com
Fri Feb 11 08:55:09 PST 2005


Steve,

Right now we have TIMEOUT_TOKEN set to 60 and we periodically see
reconfigurations. What exactly is going on when a reconfiguration occurs? Is
it cause for concern when these occur?

Thanks,
Kristen

-----Original Message-----
From: Steven Dake [mailto:sdake at mvista.com] 
Sent: Wednesday, February 09, 2005 7:12 PM
To: Smith, Kristen [NGC:B675:EXCH]
Cc: openais at lists.osdl.org; Bajpai, Muni [NGC:B670:EXCH]
Subject: RE: [Openais] status of code in bk?


Kristen
I'd suggest playing with the timing and reporting the lowest values which
work for you.  I intend to spend some time on determining this but its low
priority for now.  I'd expect that the following aggressive values should
work in a LAN setting.  If they dont, try increasing (scaling all values by
the same multiplier).

TIMEOUT_STATE_GATHER_JOIN 40
TIMEOUT_STATE_GATHER_CONSENSUS 80 (should be double join) TIMEOUT_TOKEN 90
TIMEOUT_TOKEN_RETRANSMIT 30

You may be able to get TIMEOUT_TOKEN down to 60 with more chance of
reconfigurations.

There was no intent to change the timing values.  I must have made the
change during debugging.  I often change these values to test for different
timeout values and may have inadvertantly committed that change.

When calculating the timeout for the token, I find that a token should spend
about 300 usec at each processor if there are no messages to multicast.
With 16 processors, that is about 2 msec.  If the token doesn't rotate in
TIMEOUT_TOKEN a reconfiguration occurs.  If you add one processor
multicasting 40 messages per ring rotation, a token may take 5-6 msec to
rotate.  Given that, 90 msec is sufficient to wait for a token loss
detector.

I eventually intend to make the calculation of the ring timeouts dynamically
calculated during ring formation but this work is quite a bit out (maybe
even next year).

Thanks
-steve

On Wed, 2005-02-09 at 17:41, Kristen Smith wrote:
> Steve,
> 
> One thing I notice when running the latest bitkeeper code is that the 
> time it takes to notice that another node has failed has increased. If 
> I start up 2 aisexecs (one on each node) and then ctrl-c one of them, 
> the other takes a few seconds to notice that the node went away. When 
> we started using the totem-ais code in Jan, I was impressed that the 
> time to notice the failure was decreased (almost instaneous) than it 
> had been with the previous openais, but now it seems like it is slower 
> than with the previous openais (before the totem changes).
> 
> Are there new configuration parms that I need to muck with to get the 
> node failure detection time down? (I did see your email a while back 
> on decreasing this time, I was just wondering if you had intended to 
> make the detection time greater in this new code).
> 
> Thanks,
> Kristen
> 
> -----Original Message-----
> From: Steven Dake [mailto:sdake at mvista.com]
> Sent: Tuesday, February 08, 2005 3:29 PM
> To: Smith, Kristen [NGC:B675:EXCH]
> Cc: openais at lists.osdl.org; Bajpai, Muni [NGC:B670:EXCH]
> Subject: Re: [Openais] status of code in bk?
> 
> 
> Kristen,
> 
> All of the code is now in bitkeeper.  I'll try to wrap up a freshmeat 
> release tomorrow with code coverage reports after running the tests we 
> have available.
> 
> Thanks
> -steve
> 
> On Tue, 2005-02-08 at 07:30, Kristen Smith wrote:
> > Hello,
> > 
> > Could you please tell me the status of the latest code that is in
> > bitkeeper? Does it have all the patches you guys have been putting
> out
> > for the past few weeks? If not, when do you foresee updating it with 
> > all these patches?
> > 
> > Thanks,
> > Kristen
> > 
> > 
> > 
> >
> ______________________________________________________________________
> > _______________________________________________
> > Openais mailing list
> > Openais at lists.osdl.org
> http://lists.osdl.org/mailman/listinfo/openais
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.linux-foundation.org/pipermail/openais/attachments/20050211/77fd6215/attachment-0001.htm


More information about the Openais mailing list