[Openais] Re: Configuration change question

Mark Haverkamp markh at osdl.org
Thu Oct 14 14:37:18 PDT 2004


On Thu, 2004-10-14 at 13:37 -0700, Steven Dake wrote:
> On Thu, 2004-10-14 at 13:26, Daniel McNeil wrote:
> > On Thu, 2004-10-14 at 13:04, Steven Dake wrote:
> > > On Thu, 2004-10-14 at 12:26, Mark Haverkamp wrote:
> > > > Steve,
> > > > 
> > > > If I remember correctly, the code to deliver messages from the previous
> > > > configuration that happens in the transitional configuration isn't there
> > > > yet.  This may explain what I am seeing during the event service
> > > > recovery.  I now track open channels on all nodes and keep track by gmi
> > > > messages for opens and closes.  At reconfig time, Each node sends its
> > > > open count for each channel via gmi to update any nodes that may be new.
> > > > What I am seeing is that sometimes the open count that a node receives
> > > > is different than its notion of opens for that node.  I think that maybe
> > > > an open or close was partially distributed then the config change
> > > > happened and some nodes didn't get the open/close.  Is it possible for
> > > 
> > > No this is not possible even with the current code (unless there is a
> > > bug).  All messages will be recovered from the old configuration before
> > > any configuration change is delivered.  If all messages are not
> > > recovered, you will see a repeating EVS %d %d %d lines as I'm sure you
> > > have seen in the past..
> > > 
> > > If a message is sent after a configuration change, it will not be
> > > delivered until the new configuration is formed.
> > > 
> > > The idea of VS is that we can ensure that the messages and configuration
> > > changes occur in the same order on every processor that is a member of
> > > the old and new configuration.  This probably solves the problem your
> > > having (if it works right..).
> > 
> > Steve,
> > 
> > Can you clarify what you mean by "probably solves the problem
> > you're having"?
> 
> sure..  I mean to say that the code should always ensure that messages
> arrive in the same order.
> 
> > 
> > Is the current code recovering and delivering all old
> > configuration messages before the regular configuration change
> > function gets called?
> > 
> 
> it doesn't recover and deliver all old "configuration messages" but it
> does recover and deliver all regular messages... (I think this is what
> you meant).
> 
> > What messages are sent in the transitional configuration?
> > 
> 
> None are sent yet..  This remains unimplemented.  If there were a hole
> at the end of the configuration, then a transitional configuration
> should be delivered, then any of those messages after which a hole was
> encounted are delivered.  This is to indicate to the services that "hey
> you may be missing an important message relating to your operation, so
> count all further messages as suspect".  The service may then ignore
> them, or try to do some recovery in the next configuration..
> 
> > In Mark's code he is assuming that all outstanding messages
> > have be delivered from previous configuration, then he
> > sends to all nodes the current 'open count' using messages
> > with recovery priority, then unplugs and continues.
> > 
> This seems correct and the way the gmi code works, this should work
> perfectly 100% (unless there is a hole, in which case you would know you
> had that problem because openais would continually print out "EVS state"
> with a bunch of numbers over and over).
> 
> > So the current code should be delivering all messages to 
> > all nodes in the same order even through configuration
> > changes, right?
> > 
> 
> You got it.  Thats how its supposed to work.  I really believe it works
> correctly now, except for the hole case which is related to transitional
> configurations.  If you can show it not working, then we have a pretty
> serious bug.

Ok, I think that you are right.  I went back and stuck in a bunch more
debug prints and I think that I found the problem in my code that
processes lost nodes.  I'll send a patch for the fix soon.


Thanks,
Mark.

-- 
Mark Haverkamp <markh at osdl.org>




More information about the Openais mailing list