[Openais] status of code in bk?

Steven Dake sdake at mvista.com
Fri Feb 11 10:46:33 PST 2005


Kristen,
when  reconfiguration occurs, the algorithm goes through the entire
membership state machine (which is very expensive, on the order of 40
msec or more).  This could take longer if there are messages to recover
in the EVS state.

I'd recommend increasing TOKEN_TIMEOUT to some larger value.

I did some informal tests and found that with 2 processors sending the
maximum data, a token timeout of 180 seems to work well.

Thanks
-steve

On Fri, 2005-02-11 at 09:55, Kristen Smith wrote:
> Steve,
> 
> Right now we have TIMEOUT_TOKEN set to 60 and we periodically see
> reconfigurations. What exactly is going on when a reconfiguration
> occurs? Is it cause for concern when these occur?
> 
> Thanks,
> Kristen
> 
> -----Original Message-----
> From: Steven Dake [mailto:sdake at mvista.com] 
> Sent: Wednesday, February 09, 2005 7:12 PM
> To: Smith, Kristen [NGC:B675:EXCH]
> Cc: openais at lists.osdl.org; Bajpai, Muni [NGC:B670:EXCH]
> Subject: RE: [Openais] status of code in bk?
> 
> 
> Kristen
> I'd suggest playing with the timing and reporting the lowest values
> which work for you.  I intend to spend some time on determining this
> but its low priority for now.  I'd expect that the following
> aggressive values should work in a LAN setting.  If they dont, try
> increasing (scaling all values by the same multiplier).
> 
> TIMEOUT_STATE_GATHER_JOIN 40
> TIMEOUT_STATE_GATHER_CONSENSUS 80 (should be double join)
> TIMEOUT_TOKEN 90 TIMEOUT_TOKEN_RETRANSMIT 30
> 
> You may be able to get TIMEOUT_TOKEN down to 60 with more chance of
> reconfigurations.
> 
> There was no intent to change the timing values.  I must have made the
> change during debugging.  I often change these values to test for
> different timeout values and may have inadvertantly committed that
> change.
> 
> When calculating the timeout for the token, I find that a token should
> spend about 300 usec at each processor if there are no messages to
> multicast.  With 16 processors, that is about 2 msec.  If the token
> doesn't rotate in TIMEOUT_TOKEN a reconfiguration occurs.  If you add
> one processor multicasting 40 messages per ring rotation, a token may
> take 5-6 msec to rotate.  Given that, 90 msec is sufficient to wait
> for a token loss detector.
> 
> I eventually intend to make the calculation of the ring timeouts
> dynamically calculated during ring formation but this work is quite a
> bit out (maybe even next year).
> 
> Thanks
> -steve
> 
> On Wed, 2005-02-09 at 17:41, Kristen Smith wrote:
> > Steve,
> > 
> > One thing I notice when running the latest bitkeeper code is that
> the 
> > time it takes to notice that another node has failed has increased.
> If 
> > I start up 2 aisexecs (one on each node) and then ctrl-c one of
> them, 
> > the other takes a few seconds to notice that the node went away.
> When 
> > we started using the totem-ais code in Jan, I was impressed that the
> > time to notice the failure was decreased (almost instaneous) than it
> > had been with the previous openais, but now it seems like it is
> slower 
> > than with the previous openais (before the totem changes).
> > 
> > Are there new configuration parms that I need to muck with to get
> the 
> > node failure detection time down? (I did see your email a while back
> > on decreasing this time, I was just wondering if you had intended to
> > make the detection time greater in this new code).
> > 
> > Thanks,
> > Kristen
> > 
> > -----Original Message-----
> > From: Steven Dake [mailto:sdake at mvista.com]
> > Sent: Tuesday, February 08, 2005 3:29 PM
> > To: Smith, Kristen [NGC:B675:EXCH]
> > Cc: openais at lists.osdl.org; Bajpai, Muni [NGC:B670:EXCH]
> > Subject: Re: [Openais] status of code in bk?
> > 
> > 
> > Kristen,
> > 
> > All of the code is now in bitkeeper.  I'll try to wrap up a
> freshmeat 
> > release tomorrow with code coverage reports after running the tests
> we 
> > have available.
> > 
> > Thanks
> > -steve
> > 
> > On Tue, 2005-02-08 at 07:30, Kristen Smith wrote:
> > > Hello,
> > > 
> > > Could you please tell me the status of the latest code that is in
> > > bitkeeper? Does it have all the patches you guys have been putting
> > out
> > > for the past few weeks? If not, when do you foresee updating it
> with 
> > > all these patches?
> > > 
> > > Thanks,
> > > Kristen
> > > 
> > > 
> > > 
> > >
> >
> ______________________________________________________________________
> > > _______________________________________________
> > > Openais mailing list
> > > Openais at lists.osdl.org
> > http://lists.osdl.org/mailman/listinfo/openais
> > 
> > 
> 
> 




More information about the Openais mailing list