[Openais] Need help to reduce the time wait‏ of saRecvRetry()

Steven Dake sdake at redhat.com
Tue Oct 21 14:54:43 PDT 2008


On Tue, 2008-10-21 at 12:59 +0800, Ratbag Patrick wrote:
> The test environment is simple. There are two nodes, one node A creat a checkpoint and do some write in a tight loop, then the physical connection is dropped, at the same time(about 10ms later than A's physical connection dropped) if the other node B wanna do saCkptCheckpointOpen or saCkptCheckpointRead of the opened checkpoint, it will be blocked at saRecvRetry for 3 seconds. After that saRecvRetry will return with OK. The checkpoint is about 3MB, and the network connection between A and B is 1Gbps.
> 

Can you try reducing your checkpoint size to the maximum supported size
(about 1mb?) or break it into separate checkpoint sections of less then
1MB?  Beyond that limit and bad things will happen.  An error should be
reported if you use checkpoint sizes this big from the API and the
operation should be rejected but apparently that isn't happening.  Not
sure if that is your issue or not, but it could likely be.

Regards
-steve

> 
> Thanks.
> 
> 
> > Subject: Re: [Openais] Need help to reduce the time wait‏ of saRecvRetry()
> > From: sdake at redhat.com
> > To: ratbag at live.com
> > CC: openais at lists.linux-foundation.org
> > Date: Mon, 20 Oct 2008 13:14:58 -0700
> >
> >
> > On Mon, 2008-10-20 at 14:20 +0800, Ratbag Patrick wrote:
> >> How To Reproduce:
> >> Using the latest version of whitetank, creat one ckpt, then if one
> >> node's physical network connection is dropped(like unplug the RJ45
> >> jacket), then at the same time the other node will be blocked at
> >> saRecvRetry() for about 3 seconds.
> >>
> >>
> >> Actually it's not acceptable in my envronment. Could anyone tell me
> >> how to reduce the blocked time(like 3s to 100ms) or tell me if it's a
> >> bug?
> >>
> >
> > I'll take a look when I have an opportunity.
> >
> > What is the other node doing in saRecvRetry? Reading the checkpoint in
> > a tight loop?
> >
> > If you have many checkpoints that call should still not block but might
> > return SA_AIS_ERR_TRY_AGAIN for some long period of time.
> >
> > Regards
> > -steve
> >
> >> Thanks!
> >>
> >> _______________________________________________
> >> Openais mailing list
> >> Openais at lists.linux-foundation.org
> >> https://lists.linux-foundation.org/mailman/listinfo/openais
> >
> 
> _________________________________________________________________
> Discover the new Windows Vista
> http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE



More information about the Openais mailing list