[Openais] OpenAIS ring marked FAULTY - administrative intervention required

Steven Dake sdake at redhat.com
Thu Apr 8 21:47:29 PDT 2010


On Fri, 2010-04-09 at 09:01 +0930, Darren Thompson wrote:
> Steven & team
> 
> I see that this fork was "well reasoned" in which case it should have
> been completed more fully. Since OpenAIS is effectively 'depreciated'
> it should be fully dismantled in favour of a Corosync (AIS) API
> add-on. It currently is that anyway (you cannot install OpenAIS
> without first installing Corosync anyway).
> 
> I chose "ugly duckling" deliberately as I do believe there is the
> makings of a "beautiful swan" in here somewhere... It just appears
> very "messy" at the moment.
> 
> >From an external perspective (user, not developer) there appears to
> have been an exponential increase in components (complexity) with
> little (apparent) gain in functionality. This may seem harsh but bear
> with me whilst I try to show it from my perspective.
> 
> Component/complexity time-line (Vs apparent functionality gain): 
> 
> 1. Start: Heartbeat 1 => primitive, two node cluster
> 2. Heartbeat 2.x (HB 1 + internal CRM database) => massive increase in
> functionality, multi-node clusters are now possible.
> 3. Heartbeat 2.9x + Pacemaker (CRM database becomes separate product)
> => little functionality gain, adds support for OpenAIS
> 4. OpenAis + Heartbeat 2.9x + Pacemaker => Same apparent functionality
> as "3" above.
> 5. Corosync + OpenAis + "cluster glue"(Heartbeat components re-named)
> + Pacemaker => current situation required to match functionality of
> "3" above.
> 6. (what I believe your saying as the preferred development direction)
> - Option A:  Corosync + "cluster glue" + Pacemaker => functionality
> lost, removes support for OpenAIS (functionality as "2" above)
> - Option B: Corosync + (optional/depreciated)OpenAis + "cluster
> glue"(Heartbeat components re-named) + Pacemaker => "status Quo"
> 
> I know your team have been working very hard to stabilize and improve
> both pacemaker and corosync (I'm on the mail list) but I'm struggling
> to see the significant improvement in functionality over that provided
> by Heartbeat 2.x, especially as you appear to be wanting to depreciate
> the  "OpenAis" compatibility.
> 
> It's now especially hard to determine what documentation is relevant
> as these products are now an apparent mish-mash of old and new
> functionality.
> 

Thank you for the feedback.  I know its hard to see the direct
improvements in corosync vs openais since a majority of the new work is
focused around a common infrastructure and quality first and performance
second.  One of the major goals of corosync early on was to keep parity
with user experience while also dramatically increasing the bar on
quality and performance.  If our quality or performance was poor those
would be higher priority issues then feature set.

Keep in mind everyone that works on Corosync is a true innovator at
heart, and you will see great innovations coming in the future.  You
have seen glimpses of this on the mailing list with some of the
prototype work Angus and Honza have done with node self monitoring and
our HA-aware application simple availability manager.  There is much
more to come and I hope you will value our art in the long run even
though it may appear confusing from a packaging standpoint in the short
term.

Thank you again for using the software our community produces.

Regards
-steve

> I will now 'STFU' and go back to passively monitoring the mail list
> and trying to implement these components in production.
> I hope I have at-least highlighted how messy this is all looking from
> the "end users perspective".
> 
> Thanks for your time.
> 
> Darren
> 
> 
> On Wed, 2010-04-07 at 10:22 -0700, Steven Dake wrote: 
> > On Wed, 2010-04-07 at 09:18 +0930, Darren Thompson wrote:
> > > Steven
> > > 
> > > I still do not understand why Corosync was forked off from OpenAIS.
> > > 
> > 
> > Answered in faq entry:
> > http://www.corosync.org/doku.php?id=faq:why
> > 
> > > We now have an incompatible mess of two 80% overlapping and partially
> > > co-dependant applications (You cannot even now install OpenAIS now
> > > without first installing Corosync).
> > 
> > Corosync and OpenAIS are not co-dependent.  Corosync is a standalone
> > component, whereas openais depends on Corosync.  It is true there is
> > some overlap in API feature set.
> > 
> > Most people don't require SA Forum AIS APIs, in which case there is no
> > point to installing openais.  In all of our deployments, AIS APIs
> > account for less then 5%.  That either means our AIS implementation
> > produced in openais is bad or irrelevant.  I tend to believe the
> > implementation is pretty good...
> >  
> > > 
> > > What would it take to re-merge these two ugly duckings back into a
> > > single cohesive product again?
> > 
> > The two projects have differing missions.  My philosophy is to do one
> > thing, do it well.
> > 
> > Regards
> > -steve
> > 
> > > Darren
> > > 
> > > 
> > > 
> > > On Tue, 2010-04-06 at 15:57 -0700, Steven Dake wrote: 
> > > > On Tue, 2010-04-06 at 15:26 +0200, Filip Sakalos wrote:
> > > > > Hi,
> > > > > 
> > > > > I am using openAIS and Pacemaker for clustering. I want to use two
> > > > > rings for communication between nodes. The problem is, that one of the
> > > > > rings is always marked as faulty on on one or both nodes:
> > > > > 
> > > > >  xen1:/home/filip # openais-cfgtool -s
> > > > >  Printing ring status.
> > > > >  RING ID 0
> > > > >          id      = 192.168.58.124
> > > > >          status  = Marking ringid 0 interface 192.168.58.124 FAULTY -
> > > > > adminisrtative intervention required.
> > > > >  RING ID 1
> > > > >          id      = 192.168.7.1
> > > > >          status  = ring 1 active with no faults
> > > > > 
> > > > > 
> > > > > Same on the other node:
> > > > > 
> > > > > xen2:~ # openais-cfgtool -s
> > > > > Printing ring status.
> > > > > RING ID 0
> > > > >         id      = 192.168.58.172
> > > > >         status  = Marking seqid 12298 ringid 0 interface
> > > > > 192.168.58.172 FAULTY - adminisrtative intervention required.
> > > > > RING ID 1
> > > > >         id      = 192.168.7.2
> > > > >         status  = ring 1 active with no faults
> > > > > 
> > > > > This is my configuration file (/etc/ais/openais.conf):
> > > > > 
> > > > > # Please read the openais.conf.5 manual page
> > > > > 
> > > > > aisexec {
> > > > >     # Run as root - this is necessary to be able to manage resources
> > > > > with Pacemaker
> > > > >     user:    root
> > > > >     group:    root
> > > > > }
> > > > > 
> > > > > service {
> > > > >     # Load the Pacemaker Cluster Resource Manager
> > > > >     ver:       0
> > > > >     name:      pacemaker
> > > > >     use_mgmtd: 1
> > > > > }
> > > > > 
> > > > > totem {
> > > > >     version: 2
> > > > > 
> > > > >     # How long before declaring a token lost (ms)
> > > > >     token:          1000
> > > > > 
> > > > >     # How many token retransmits before forming a new configuration
> > > > >     token_retransmits_before_loss_const: 10
> > > > > 
> > > > >     # How long to wait for join messages in the membership protocol (ms)
> > > > >     join:           60
> > > > > 
> > > > >     # How long to wait for consensus to be achieved before starting a
> > > > > new round of membership configuration (ms)
> > > > >     consensus:      1500
> > > > > 
> > > > >     # Turn off the virtual synchrony filter
> > > > >     vsftype:        none
> > > > > 
> > > > >     # Number of messages that may be sent by one processor on receipt
> > > > > of the token
> > > > >     max_messages:   20
> > > > > 
> > > > >     # Stagger sending the node join messages by 1..send_join ms
> > > > >     send_join: 45
> > > > > 
> > > > >     # Limit generated nodeids to 31-bits (positive signed integers)
> > > > >     clear_node_high_bit: yes
> > > > > 
> > > > >     # Disable encryption
> > > > >     secauth:    on
> > > > > 
> > > > >     # How many threads to use for encryption/decryption
> > > > >     threads:       0
> > > > > 
> > > > >     # Optionally assign a fixed node id (integer)
> > > > >     # nodeid:         1234
> > > > > 
> > > > >     rrp_mode: passive
> > > > > 
> > > > >     interface {
> > > > >         ringnumber: 0
> > > > >         # The following values need to be set based on your environment
> > > > >         bindnetaddr: 192.168.58.0
> > > > >         mcastaddr: 226.94.1.1
> > > > >         mcastport: 5405
> > > > >     }
> > > > > 
> > > > >     interface {
> > > > > 
> > > > >         ringnumber: 1
> > > > >         bindnetaddr: 192.168.7.0
> > > > >         mcastaddr: 226.94.1.2
> > > > >         mcastport: 5405
> > > > >     }
> > > > > }
> > > > > 
> > > > > #logging {
> > > > > #    debug: off
> > > > > #    fileline: off
> > > > > #    to_syslog: yes
> > > > > #    to_stderr: off
> > > > > #    syslog_facility: daemon
> > > > > #    timestamp: on
> > > > > #}
> > > > > 
> > > > > logging {
> > > > >     debug: on
> > > > >     to_file: yes
> > > > >     logfile: /var/log/openais.log
> > > > >     to_syslog: yes
> > > > >     syslog_facility: daemon
> > > > >     timestamp: on
> > > > > }
> > > > > 
> > > > > amf {
> > > > >     mode: disabled
> > > > > }
> > > > > 
> > > > > #eof
> > > > > 
> > > > > I can ping the other node without problem, ssh works too. Can anyone help?
> > > > > 
> > > > > 
> > > > 
> > > > I recommend using corosync instead of openais.  Corosync is much more
> > > > suitable for running pacemaker yet is nearly the same from a user
> > > > perspective (similar configuration, etc).
> > > > 
> > > > Provide the syslog output for the two nodes
> > > > 
> > > > Run ifconfig on the nodes and paste the output
> > > > 
> > > > Regards
> > > > -steve
> > > > 
> > > > > 
> > > > > Sincerely,
> > > > > Filip Sakalos
> > > > > _______________________________________________
> > > > > Openais mailing list
> > > > > Openais at lists.linux-foundation.org
> > > > > https://lists.linux-foundation.org/mailman/listinfo/openais
> > > > 
> > > > _______________________________________________
> > > > Openais mailing list
> > > > Openais at lists.linux-foundation.org
> > > > https://lists.linux-foundation.org/mailman/listinfo/openais
> > 



More information about the Openais mailing list