[Openais] RE: Need help to reduce the time wait‏ of saRecvRetry()

Ratbag Patrick ratbag at live.com
Tue Oct 21 23:31:54 PDT 2008





> Subject: RE: [Openais] Need help to reduce the time wait‏ of saRecvRetry()
> From: sdake at redhat.com
> To: ratbag at live.com
> CC: openais at lists.linux-foundation.org
> Date: Tue, 21 Oct 2008 14:54:43 -0700
>
>
> On Tue, 2008-10-21 at 12:59 +0800, Ratbag Patrick wrote:
>> The test environment is simple. There are two nodes, one node A creat a checkpoint and do some write in a tight loop, then the physical connection is dropped, at the same time(about 10ms later than A's physical connection dropped) if the other node B wanna do saCkptCheckpointOpen or saCkptCheckpointRead of the opened checkpoint, it will be blocked at saRecvRetry for 3 seconds. After that saRecvRetry will return with OK. The checkpoint is about 3MB, and the network connection between A and B is 1Gbps.
>>
>
> Can you try reducing your checkpoint size to the maximum supported size
> (about 1mb?) or break it into separate checkpoint sections of less then
> 1MB? Beyond that limit and bad things will happen. An error should be
> reported if you use checkpoint sizes this big from the API and the
> operation should be rejected but apparently that isn't happening. Not
> sure if that is your issue or not, but it could likely be.
>
> Regards
> -steve
>

That's fine, I have changed the ckpt to:

SaCkptCheckpointCreationAttributesT checkpointCreationAttributes = {
	SA_CKPT_WR_ALL_REPLICAS, 200000, SA_TIME_MAX, 100, 3000, 24
};

And now the size of checkpoint is about 100kb, it's still blocked at saRecvRetry with the same 3 seconds while the other node's physical connection is dropped.

>>
>> Thanks.
>>
>>
>>> Subject: Re: [Openais] Need help to reduce the time wait‏ of saRecvRetry()
>>> From: sdake at redhat.com
>>> To: ratbag at live.com
>>> CC: openais at lists.linux-foundation.org
>>> Date: Mon, 20 Oct 2008 13:14:58 -0700
>>>
>>>
>>> On Mon, 2008-10-20 at 14:20 +0800, Ratbag Patrick wrote:
>>>> How To Reproduce:
>>>> Using the latest version of whitetank, creat one ckpt, then if one
>>>> node's physical network connection is dropped(like unplug the RJ45
>>>> jacket), then at the same time the other node will be blocked at
>>>> saRecvRetry() for about 3 seconds.
>>>>
>>>>
>>>> Actually it's not acceptable in my envronment. Could anyone tell me
>>>> how to reduce the blocked time(like 3s to 100ms) or tell me if it's a
>>>> bug?
>>>>
>>>
>>> I'll take a look when I have an opportunity.
>>>
>>> What is the other node doing in saRecvRetry? Reading the checkpoint in
>>> a tight loop?
>>>
>>> If you have many checkpoints that call should still not block but might
>>> return SA_AIS_ERR_TRY_AGAIN for some long period of time.
>>>
>>> Regards
>>> -steve
>>>
>>>> Thanks!
>>>>
>>>> _______________________________________________
>>>> Openais mailing list
>>>> Openais at lists.linux-foundation.org
>>>> https://lists.linux-foundation.org/mailman/listinfo/openais
>>>
>>
>> _________________________________________________________________
>> Discover the new Windows Vista
>> http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE
>

_________________________________________________________________
Connect to the next generation of MSN Messenger 
http://imagine-msn.com/messenger/launch80/default.aspx?locale=en-us&source=wlmailtagline


More information about the Openais mailing list