[Openais] OpenAIS ring marked FAULTY - administrative intervention required

Darren Thompson darrent at akurit.com.au
Thu Apr 8 16:31:40 PDT 2010


Steven & team

I see that this fork was "well reasoned" in which case it should have
been completed more fully. Since OpenAIS is effectively 'depreciated' it
should be fully dismantled in favour of a Corosync (AIS) API add-on. It
currently is that anyway (you cannot install OpenAIS without first
installing Corosync anyway).

I chose "ugly duckling" deliberately as I do believe there is the
makings of a "beautiful swan" in here somewhere... It just appears very
"messy" at the moment.

>From an external perspective (user, not developer) there appears to have
been an exponential increase in components (complexity) with little
(apparent) gain in functionality. This may seem harsh but bear with me
whilst I try to show it from my perspective.

Component/complexity time-line (Vs apparent functionality gain): 

1. Start: Heartbeat 1 => primitive, two node cluster
2. Heartbeat 2.x (HB 1 + internal CRM database) => massive increase in
functionality, multi-node clusters are now possible.
3. Heartbeat 2.9x + Pacemaker (CRM database becomes separate product) =>
little functionality gain, adds support for OpenAIS
4. OpenAis + Heartbeat 2.9x + Pacemaker => Same apparent functionality
as "3" above.
5. Corosync + OpenAis + "cluster glue"(Heartbeat components re-named) +
Pacemaker => current situation required to match functionality of "3"
above.
6. (what I believe your saying as the preferred development direction)
- Option A:  Corosync + "cluster glue" + Pacemaker => functionality
lost, removes support for OpenAIS (functionality as "2" above)
- Option B: Corosync + (optional/depreciated)OpenAis + "cluster
glue"(Heartbeat components re-named) + Pacemaker => "status Quo"

I know your team have been working very hard to stabilize and improve
both pacemaker and corosync (I'm on the mail list) but I'm struggling to
see the significant improvement in functionality over that provided by
Heartbeat 2.x, especially as you appear to be wanting to depreciate the
"OpenAis" compatibility.

It's now especially hard to determine what documentation is relevant as
these products are now an apparent mish-mash of old and new
functionality.

I will now 'STFU' and go back to passively monitoring the mail list and
trying to implement these components in production.
I hope I have at-least highlighted how messy this is all looking from
the "end users perspective".

Thanks for your time.

Darren


On Wed, 2010-04-07 at 10:22 -0700, Steven Dake wrote:

> On Wed, 2010-04-07 at 09:18 +0930, Darren Thompson wrote:
> > Steven
> > 
> > I still do not understand why Corosync was forked off from OpenAIS.
> > 
> 
> Answered in faq entry:
> http://www.corosync.org/doku.php?id=faq:why
> 
> > We now have an incompatible mess of two 80% overlapping and partially
> > co-dependant applications (You cannot even now install OpenAIS now
> > without first installing Corosync).
> 
> Corosync and OpenAIS are not co-dependent.  Corosync is a standalone
> component, whereas openais depends on Corosync.  It is true there is
> some overlap in API feature set.
> 
> Most people don't require SA Forum AIS APIs, in which case there is no
> point to installing openais.  In all of our deployments, AIS APIs
> account for less then 5%.  That either means our AIS implementation
> produced in openais is bad or irrelevant.  I tend to believe the
> implementation is pretty good...
>  
> > 
> > What would it take to re-merge these two ugly duckings back into a
> > single cohesive product again?
> 
> The two projects have differing missions.  My philosophy is to do one
> thing, do it well.
> 
> Regards
> -steve
> 
> > Darren
> > 
> > 
> > 
> > On Tue, 2010-04-06 at 15:57 -0700, Steven Dake wrote: 
> > > On Tue, 2010-04-06 at 15:26 +0200, Filip Sakalos wrote:
> > > > Hi,
> > > > 
> > > > I am using openAIS and Pacemaker for clustering. I want to use two
> > > > rings for communication between nodes. The problem is, that one of the
> > > > rings is always marked as faulty on on one or both nodes:
> > > > 
> > > >  xen1:/home/filip # openais-cfgtool -s
> > > >  Printing ring status.
> > > >  RING ID 0
> > > >          id      = 192.168.58.124
> > > >          status  = Marking ringid 0 interface 192.168.58.124 FAULTY -
> > > > adminisrtative intervention required.
> > > >  RING ID 1
> > > >          id      = 192.168.7.1
> > > >          status  = ring 1 active with no faults
> > > > 
> > > > 
> > > > Same on the other node:
> > > > 
> > > > xen2:~ # openais-cfgtool -s
> > > > Printing ring status.
> > > > RING ID 0
> > > >         id      = 192.168.58.172
> > > >         status  = Marking seqid 12298 ringid 0 interface
> > > > 192.168.58.172 FAULTY - adminisrtative intervention required.
> > > > RING ID 1
> > > >         id      = 192.168.7.2
> > > >         status  = ring 1 active with no faults
> > > > 
> > > > This is my configuration file (/etc/ais/openais.conf):
> > > > 
> > > > # Please read the openais.conf.5 manual page
> > > > 
> > > > aisexec {
> > > >     # Run as root - this is necessary to be able to manage resources
> > > > with Pacemaker
> > > >     user:    root
> > > >     group:    root
> > > > }
> > > > 
> > > > service {
> > > >     # Load the Pacemaker Cluster Resource Manager
> > > >     ver:       0
> > > >     name:      pacemaker
> > > >     use_mgmtd: 1
> > > > }
> > > > 
> > > > totem {
> > > >     version: 2
> > > > 
> > > >     # How long before declaring a token lost (ms)
> > > >     token:          1000
> > > > 
> > > >     # How many token retransmits before forming a new configuration
> > > >     token_retransmits_before_loss_const: 10
> > > > 
> > > >     # How long to wait for join messages in the membership protocol (ms)
> > > >     join:           60
> > > > 
> > > >     # How long to wait for consensus to be achieved before starting a
> > > > new round of membership configuration (ms)
> > > >     consensus:      1500
> > > > 
> > > >     # Turn off the virtual synchrony filter
> > > >     vsftype:        none
> > > > 
> > > >     # Number of messages that may be sent by one processor on receipt
> > > > of the token
> > > >     max_messages:   20
> > > > 
> > > >     # Stagger sending the node join messages by 1..send_join ms
> > > >     send_join: 45
> > > > 
> > > >     # Limit generated nodeids to 31-bits (positive signed integers)
> > > >     clear_node_high_bit: yes
> > > > 
> > > >     # Disable encryption
> > > >     secauth:    on
> > > > 
> > > >     # How many threads to use for encryption/decryption
> > > >     threads:       0
> > > > 
> > > >     # Optionally assign a fixed node id (integer)
> > > >     # nodeid:         1234
> > > > 
> > > >     rrp_mode: passive
> > > > 
> > > >     interface {
> > > >         ringnumber: 0
> > > >         # The following values need to be set based on your environment
> > > >         bindnetaddr: 192.168.58.0
> > > >         mcastaddr: 226.94.1.1
> > > >         mcastport: 5405
> > > >     }
> > > > 
> > > >     interface {
> > > > 
> > > >         ringnumber: 1
> > > >         bindnetaddr: 192.168.7.0
> > > >         mcastaddr: 226.94.1.2
> > > >         mcastport: 5405
> > > >     }
> > > > }
> > > > 
> > > > #logging {
> > > > #    debug: off
> > > > #    fileline: off
> > > > #    to_syslog: yes
> > > > #    to_stderr: off
> > > > #    syslog_facility: daemon
> > > > #    timestamp: on
> > > > #}
> > > > 
> > > > logging {
> > > >     debug: on
> > > >     to_file: yes
> > > >     logfile: /var/log/openais.log
> > > >     to_syslog: yes
> > > >     syslog_facility: daemon
> > > >     timestamp: on
> > > > }
> > > > 
> > > > amf {
> > > >     mode: disabled
> > > > }
> > > > 
> > > > #eof
> > > > 
> > > > I can ping the other node without problem, ssh works too. Can anyone help?
> > > > 
> > > > 
> > > 
> > > I recommend using corosync instead of openais.  Corosync is much more
> > > suitable for running pacemaker yet is nearly the same from a user
> > > perspective (similar configuration, etc).
> > > 
> > > Provide the syslog output for the two nodes
> > > 
> > > Run ifconfig on the nodes and paste the output
> > > 
> > > Regards
> > > -steve
> > > 
> > > > 
> > > > Sincerely,
> > > > Filip Sakalos
> > > > _______________________________________________
> > > > Openais mailing list
> > > > Openais at lists.linux-foundation.org
> > > > https://lists.linux-foundation.org/mailman/listinfo/openais
> > > 
> > > _______________________________________________
> > > Openais mailing list
> > > Openais at lists.linux-foundation.org
> > > https://lists.linux-foundation.org/mailman/listinfo/openais
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.linux-foundation.org/pipermail/openais/attachments/20100409/93fce2fb/attachment-0001.htm 


More information about the Openais mailing list