[Openais] problem running ocfs2/o2cb with openais/pacemaker

Andrew Beekhof andrew at beekhof.net
Mon Apr 12 05:46:39 PDT 2010


Please keep all replies on the list.

On Apr 12, 2010, at 2:44 PM, Jürgen Herrmann wrote:

> 
> On Mon, 12 Apr 2010 14:25:55 +0200, Andrew Beekhof <andrew at beekhof.net>
> wrote:
>> What versions of openais (corosync?) and pacemaker are you using?
> 
> app1a:~# apt-show-versions |grep pacemaker
> pacemaker/sid upgradeable from 1.0.8-3~bpo50+1 to 1.0.8+hg15494-2
> 
> app1a:~# apt-show-versions |grep openais
> libopenais-dev/lenny uptodate 1.1.2-1~bpo50+1
> libopenais3/lenny uptodate 1.1.2-1~bpo50+1
> openais/lenny uptodate 1.1.2-1~bpo50+1

Looks ok.
Perhaps ping the ocfs2 guys to see what control device its trying open.

> 
>> 
>> On Mon, Apr 12, 2010 at 2:00 PM, Jürgen Herrmann
>> <Juergen.Herrmann at xlhost.de> wrote:
>>> 
>>> hi!
>>> 
>>> i'm on debian lenny and trying to run ocfs2 on a dual primary
>>> drbd device. the drbd device is already set up as msDRBD0.
>>> 
>>> to get dlm_controld.pcmk i installed it from source (from
>>> cluster-suite-3.0.10)
>>> now i configured a resource "resDLM" with 2 clones:
>>>  primitive resDLM ocf:pacemaker:controld op monitor interval="120s"
>>>  clone cloneDLM resDLM meta globally-unique="false" interleave="true"
>>>  colocation colDLM_DRBD0 inf: cloneDLM msDRBD0:Master
>>>  order ordDRBD0_DLM inf: msDRBD0:promote cloneDLM:start
>>> -> seems to work.
>>> 
>>> 
>>> to get ocfs2_controld.pcmk i installed ocfs2-tools-1.4.3 from source.
>>> after adding the resource:
>>>  primitive resO2CB ocf:pacemaker:o2cb op monitor interval="120s"
>>>  clone cloneO2CB resO2CB meta globally-unique="false" interleave="true"
>>>  colocation colO2CB_DLM inf: cloneO2CB cloneDLM
>>>  order ordDLM_O2CB inf: cloneDLM cloneO2CB
>>> 
>>> i get the following errors in crm_mon:
>>> ======================================
>>> Failed actions:
>>>    resO2CB:0_start_0 (node=app1b.xlhost.de, call=28, rc=1,
>>> status=complete): unknown error
>>>    resO2CB:0_start_0 (node=app1a.xlhost.de, call=38, rc=1,
>>> status=complete): unknown error
>>> 
>>> 
>>> the relevant syslog entries:
>>> ============================
>>> Apr 12 13:15:18 app1a corosync[4638]:   [pcmk  ] info: pcmk_notify:
>>> Enabling node
>>>  notifications for child 8311 (0xd83090)
>>> Apr 12 13:15:18 app1a ocfs2_controld.pcmk: Error opening control
> device:
>>> Unable to  access cluster service
>>> 
>>> 
>>> 
>>> if i start "ocfs2_controld.pcmk -D" i get:
>>> ==========================================
>>> ocfs2_controld[18489]: 2010/04/12_13:40:39 info: init_ais_connection:
>>> Creating connection to our AIS plugin
>>> ocfs2_controld[18489]: 2010/04/12_13:40:39 info: init_ais_connection:
> AIS
>>> connection established
>>> ocfs2_controld[18489]: 2010/04/12_13:40:39 info: get_ais_nodeid: Server
>>> details: id=569559765 uname=app1a.xlhost.de cname=pcmk
>>> ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node
>>> app1a.xlhost.de now has id: 569559765
>>> ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node
>>> 569559765 is now known as app1a.xlhost.de
>>> 1271072439 setup_stack at 168: Cluster connection established.  Local node
>>> id: 569559765
>>> 1271072439 setup_stack at 172: Added Pacemaker as client 1 with fd 5
>>> 1271072439 setup_ckpt at 609: Initializing CKPT service (try 1)
>>> 1271072439 setup_ckpt at 615: Connected to CKPT service with handle
>>> 0x327b23c600000000
>>> 1271072439 call_ckpt_open at 160: Opening checkpoint
>>> "ocfs2:controld:21f2cad5" (try 1)
>>> 1271072439 call_ckpt_open at 170: Opened checkpoint
>>> "ocfs2:controld:21f2cad5"
>>> with handle 0x6633487300000000
>>> 1271072439 call_section_write at 340: Writing to section
>>> "daemon_max_protocol" on checkpoint "ocfs2:controld:21f2cad5" (try 1)
>>> 1271072439 call_section_create at 292: Creating section
>>> "daemon_max_protocol"
>>> on checkpoint "ocfs2:controld:21f2cad5" (try 1)
>>> 1271072439 call_section_create at 300: Created section
> "daemon_max_protocol"
>>> on checkpoint "ocfs2:controld:21f2cad5"
>>> 1271072439 call_section_write at 340: Writing to section
>>> "ocfs2_max_protocol"
>>> on checkpoint "ocfs2:controld:21f2cad5" (try 1)
>>> 1271072439 call_section_create at 292: Creating section
> "ocfs2_max_protocol"
>>> on checkpoint "ocfs2:controld:21f2cad5" (try 1)
>>> 1271072439 call_section_create at 300: Created section
> "ocfs2_max_protocol"
>>> on checkpoint "ocfs2:controld:21f2cad5"
>>> 1271072439 start_join at 588: Starting join for group "ocfs2:controld"
>>> 1271072439 start_join at 592: cpg_join succeeded
>>> 1271072439 loop at 975: setup done
>>> ocfs2_controld[18489]: 2010/04/12_13:40:39 notice: ais_dispatch:
>>> Membership 156: quorum acquired
>>> ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_update_peer: Node
>>> app1a.xlhost.de: id=569559765 state=member (new) addr=r(0)
>>> ip(213.202.242.161)  (new) votes=1 (new) born=156 seen=156
>>> proc=00000000000000000000000000013312 (new)
>>> ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node
>>> app1b.xlhost.de now has id: 586336981
>>> ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node
>>> 586336981 is now known as app1b.xlhost.de
>>> ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_update_peer: Node
>>> app1b.xlhost.de: id=586336981 state=member (new) addr=r(0)
>>> ip(213.202.242.162)  votes=1 born=148 seen=156
>>> proc=00000000000000000000000000013312
>>> 1271072439 confchg_cb at 495: confchg called
>>> 1271072439 daemon_change at 398: ocfs2_controld (group "ocfs2:controld")
>>> confchg: members 1, left 0, joined 1
>>> 1271072439 cpg_joined at 909: CPG is live, we are the first daemon
>>> 1271072439 call_ckpt_open at 160: Opening checkpoint "ocfs2:controld" (try
>>> 1)
>>> 1271072439 call_ckpt_open at 170: Opened checkpoint "ocfs2:controld" with
>>> handle 0x2ae8944a00000001
>>> 1271072439 call_section_write at 340: Writing to section "daemon_protocol"
>>> on
>>> checkpoint "ocfs2:controld" (try 1)
>>> 1271072439 call_section_create at 292: Creating section "daemon_protocol"
> on
>>> checkpoint "ocfs2:controld" (try 1)
>>> 1271072439 call_section_create at 300: Created section "daemon_protocol"
> on
>>> checkpoint "ocfs2:controld"
>>> 1271072439 call_section_write at 340: Writing to section "ocfs2_protocol"
> on
>>> checkpoint "ocfs2:controld" (try 1)
>>> 1271072439 call_section_create at 292: Creating section "ocfs2_protocol"
> on
>>> checkpoint "ocfs2:controld" (try 1)
>>> 1271072439 call_section_create at 300: Created section "ocfs2_protocol" on
>>> checkpoint "ocfs2:controld"
>>> 1271072439 cpg_joined at 923: Daemon protocol is 1.0
>>> 1271072439 cpg_joined at 925: fs protocol is 1.0
>>> 1271072439 cpg_joined at 927: Connecting to dlm_controld
>>>>>>>>>>>>>>>>>>>>>>>>>>> here's the error <<<<<<<<<<<<<<<<<<<<<<
>>> 1271072439 cpg_joined at 934: Opening control device
>>> 1271072439 cpg_joined at 938: Error opening control device: Unable to
> access
>>> cluster service
>>> 1271072439 exit_dlmcontrol at 363: Closing dlm_controld connection
>>> 1271072439 start_leave at 613: leaving group "ocfs2:controld"
>>> 1271072439 start_leave at 626: cpg_leave succeeded
>>> 1271072439 exit_cpg at 760: closing cpg connection
>>> 1271072439 call_ckpt_close at 240: Closing checkpoint
>>> "ocfs2:controld:21f2cad5" (try 1)
>>> 1271072439 call_ckpt_close at 246: Closed checkpoint
>>> "ocfs2:controld:21f2cad5"
>>> 1271072439 exit_ckpt at 643: Disconnecting from CKPT service (try 1)
>>> 1271072439 exit_ckpt at 647: Disconnected from CKPT service
>>> 1271072439 exit_stack at 144: closing pacemaker connection
>>> ocfs2_controld[18489]: 2010/04/12_13:40:39 notice:
>>> terminate_ais_connection: Disconnected from AIS
>>> 
>>> 
>>> obviously ocfs2_controld.pcmk can connect to the openais CKPT service
> and
>>> to dlm_controld.pcmk, which then terminates the connection.
>>> here's the output from dlm_controld.pcmk -q 0 -D:
>>> (the last 6 lines show 3 connection attempts from ocfs2_controld.pcmk!)
>>> =======================================================================
>>> 1271072755 dlm_controld 3.0.10 started
>>> cluster-dlm[20608]: 2010/04/12_13:45:55 info: init_ais_connection:
>>> Creating connection to our AIS plugin
>>> cluster-dlm[20608]: 2010/04/12_13:45:55 info: init_ais_connection: AIS
>>> connection established
>>> cluster-dlm[20608]: 2010/04/12_13:45:55 info: get_ais_nodeid: Server
>>> details: id=569559765 uname=app1a.xlhost.de cname=pcmk
>>> cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_new_peer: Node
>>> app1a.xlhost.de now has id: 569559765
>>> cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_new_peer: Node
>>> 569559765
>>> is now known as app1a.xlhost.de
>>> 1271072755 found /dev/misc/dlm-control minor 58
>>> 1271072755 found /dev/misc/dlm-monitor minor 57
>>> 1271072755 found /dev/misc/dlm_plock minor 56
>>> 1271072755 /dev/misc/dlm-monitor fd 9
>>> 1271072755 /sys/kernel/config/dlm/cluster/comms: opendir failed: 2
>>> 1271072755 /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2
>>> 1271072755 confdb_key_get error 11
>>> 1271072755 group_mode 3 compat 0
>>> 1271072755 setup_cpg_daemon 11
>>> 1271072755 dlm:controld conf 2 1 0 memb 569559765 586336981 join
>>> 569559765
>>> left
>>> 1271072755 run protocol from nodeid 586336981
>>> 1271072755 daemon run 1.1.1 max 1.1.1 kernel run 1.1.1 max 1.1.1
>>> 1271072755 plocks 13
>>> 1271072755 plock cpg message size: 104 bytes
>>> cluster-dlm[20608]: 2010/04/12_13:45:55 notice: ais_dispatch:
> Membership
>>> 156: quorum acquired
>>> cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_update_peer: Node
>>> app1a.xlhost.de: id=569559765 state=member (new) addr=r(0)
>>> ip(213.202.242.161)  (new) votes=1 (new) born=156 seen=156
>>> proc=00000000000000000000000000013312 (new)
>>> cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_new_peer: Node
>>> app1b.xlhost.de now has id: 586336981
>>> cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_new_peer: Node
>>> 586336981
>>> is now known as app1b.xlhost.de
>>> cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_update_peer: Node
>>> app1b.xlhost.de: id=586336981 state=member (new) addr=r(0)
>>> ip(213.202.242.162)  votes=1 born=148 seen=156
>>> proc=00000000000000000000000000013312
>>> 1271072755 Processing membership 156
>>> 1271072755 Adding address ip(213.202.242.161) to configfs for node
>>> 569559765
>>> 1271072755 set_configfs_node 569559765 213.202.242.161 local 1
>>> 1271072755 Added active node 569559765: born-on=156, last-seen=156,
>>> this-event=156, last-event=0
>>> 1271072755 Adding address ip(213.202.242.162) to configfs for node
>>> 586336981
>>> 1271072755 set_configfs_node 586336981 213.202.242.162 local 0
>>> 1271072755 Added active node 586336981: born-on=148, last-seen=156,
>>> this-event=156, last-event=0
>>> 1271072763 client connection 5 fd 14
>>> 1271072763 connection 5 read error -1
>>> 1271072776 client connection 5 fd 14
>>> 1271072776 connection 5 read error -1
>>> 1271072779 client connection 5 fd 14
>>> 1271072779 connection 5 read error -1
>>> 
>>> 
>>> 
>>> i'm pretty lost at the moment, as there's nothing i can find via google
>>> regarding the "core" problem:
>>> 1271072439 cpg_joined at 934: Opening control device
>>> 1271072439 cpg_joined at 938: Error opening control device: Unable to
> access
>>> cluster service
>>> 
>>> 
>>> any help would be greatly appreciated.
>>> 
>>> best regards,
>>> jürgen herrmann
>>> --
>>>>> XLhost.de - eXperts in Linux hosting ® <<
>>> 
>>> XLhost.de GmbH
>>> Jürgen Herrmann, Geschäftsführer
>>> Boelckestrasse 21, 93051 Regensburg, Germany
>>> 
>>> Geschäftsführer: Volker Geith, Jürgen Herrmann
>>> Registriert unter: HRB9918
>>> Umsatzsteuer-Identifikationsnummer: DE245931218
>>> 
>>> Fon:  +49 (0)800 XLHOSTDE [0800 95467833]
>>> Fax:  +49 (0)800 95467830
>>> 
>>> WEB:  http://www.XLhost.de
>>> IRC:  #XLhost at irc.quakenet.org
>>> _______________________________________________
>>> Openais mailing list
>>> Openais at lists.linux-foundation.org
>>> https://lists.linux-foundation.org/mailman/listinfo/openais
>> _______________________________________________
>> Openais mailing list
>> Openais at lists.linux-foundation.org
>> https://lists.linux-foundation.org/mailman/listinfo/openais
> 
> -- 
>>> XLhost.de - eXperts in Linux hosting ® <<
> 
> XLhost.de GmbH
> Jürgen Herrmann, Geschäftsführer
> Boelckestrasse 21, 93051 Regensburg, Germany
> 
> Geschäftsführer: Volker Geith, Jürgen Herrmann
> Registriert unter: HRB9918
> Umsatzsteuer-Identifikationsnummer: DE245931218
> 
> Fon:  +49 (0)800 XLHOSTDE [0800 95467833]
> Fax:  +49 (0)800 95467830
> 
> WEB:  http://www.XLhost.de
> IRC:  #XLhost at irc.quakenet.org

-- Andrew





More information about the Openais mailing list