[Openais] Failover constraint problem

Sandor Feher sfeher at bluesystem.hu
Fri Apr 16 15:21:38 PDT 2010


Hi,

First of all my goal is to set up a two-node cluster with pacemaker to 
serve our webhosting service.
This config sites on two vmware virtual machines for testing purposes 
now. Both of them runs Debian Lenny.

Here are the basic rules I set up:

node0  has

virtual ip
drbd primary filesystem mounted under /mnt
nfs server offers /mnt mount point to node1

node1

drbd secondary node
nfs_client mounts node0's /mnt dir and it should be rw for both nodes

If  node0 fails then node1 will act as primary drbd node, take over 
virtual ip and mount drbd partition under /mnt dir and will not start 
nfs_client resource because it makes no sense (nfs_client should be take 
down before drbd partition get mounted under /mnt).
If node1 fails the nothing should be happen because nfs_client only run 
node which has secondary drbd partition

So my problems are the following.

1.  If I migrate apache-group resorce to another node then nfs_client 
won't release the /mnt mount point (I know according to this config it 
should not).
     I think I need some clever constraint to achieve this.

2. If I shot down node1 (suppose that node0 the master at the moment and 
runs apache-group) then nothing happens as expected but if node1 comes 
online again the apache-group start to migrate to node1. I don't 
understand why because there is a constraint for this to get 
apache-group run on node which primary drbd resource and in this 
situation node0 is.


crm configure show

node node0 \
        attributes standby="off"
node node1 \
        attributes standby="off"
primitive drbd0 ocf:heartbeat:drbd \
        params drbd_resource="r0" \
        op monitor interval="59s" role="Master" timeout="30s" \
        op monitor interval="60s" role="Slave" timeout="30s"
primitive fs0 ocf:heartbeat:Filesystem \
        params fstype="ext3" directory="/mnt" device="/dev/drbd0" \
        meta target-role="Started"
primitive nfs_client ocf:heartbeat:Filesystem \
        params fstype="nfs" directory="/mnt/" 
device="192.168.1.40:/mnt/" 
options="hard,intr,noatime,rw,nolock,tcp,timeo=50" \
        meta target-role="Stopped"
primitive nfs_server lsb:nfs-kernel-server \
        op monitor interval="1min"
primitive virtual-ip ocf:heartbeat:IPaddr2 \
        params ip="192.168.1.40" broadcast="192.168.1.255" nic="eth0" 
cidr_netmask="24" \
        op monitor interval="21s" timeout="5s" target-role="Started"
group apache-group fs0 virtual-ip nfs_server \
        meta target-role="Started"
ms ms-drbd0 drbd0 \
        meta clone-max="2" notify="true" globally-unique="false" 
target-role="Started"
location cli-prefer-apache-group apache-group \
        rule $id="cli-prefer-rule-apache-group" inf: #uname eq node0
colocation apache-group-on-ms-drbd0 inf: apache-group ms-drbd0:Master
colocation co_nfs_client inf: nfs_client ms-drbd0:Slave
order ms-drbd0-before-apache-group inf: ms-drbd0:promote apache-group:start
order ms-drbd0-before-nfs_client inf: ms-drbd0:promote nfs_client:start
property $id="cib-bootstrap-options" \
        dc-version="1.0.8-2c98138c2f070fcb6ddeab1084154cffbf44ba75" \
        cluster-infrastructure="openais" \
        stonith-enabled="false" \
        no-quorum-policy="ignore" \
        expected-quorum-votes="2" \
        last-lrm-refresh="1271453094"

node1:~# crm_mon -1
============
Last updated: Fri Apr 16 23:49:30 2010
Stack: openais
Current DC: node0 - partition with quorum
Version: 1.0.8-2c98138c2f070fcb6ddeab1084154cffbf44ba75
2 Nodes configured, 2 expected votes
3 Resources configured.
============

Online: [ node0 node1 ]

 Resource Group: apache-group
     fs0        (ocf::heartbeat:Filesystem):    Started node1 
(unmanaged) FAILED
     virtual-ip (ocf::heartbeat:IPaddr2):       Stopped
     nfs_server (lsb:nfs-kernel-server):        Stopped
 Master/Slave Set: ms-drbd0
     Masters: [ node0 ]
     Slaves: [ node1 ]
 nfs_client     (ocf::heartbeat:Filesystem):    Started node1 
(unmanaged) FAILED

Failed actions:
    nfs_client_start_0 (node=node0, call=98, rc=1, status=complete): 
unknown error
    fs0_stop_0 (node=node1, call=9, rc=-2, status=Timed Out): unknown 
exec error
    nfs_client_stop_0 (node=node1, call=7, rc=-2, status=Timed Out): 
unknown exec error


I really appreciate any idea. Thank you in advance.

Regards,   Sandor


More information about the Openais mailing list