Active/Active Clustering using GFS2

Last Modified: 2019-02-11
We're trying to setup a two-node active/active cluster on CentOS 7 using Pacemaker/Corosync/DLM/CLVMD/Fencing with a shared LUN over multipath iSCSI connections. The trouble i'm having is when I add the GFS2 filesystems as a resource to the cluster but I get an error. When I do pcs resource debug-start <resource-id> on both nodes, it was able to mount the filesystem. I was able to write file from one node and see it on the other and vice versa. Here's the pcs status. I have a deadline on this project to get it working by early next week!

[root@ddc-testwp1 /]# pcs status --full
Cluster name: WPCluster

Following stonith devices have the 'action' option set, it is recommended to set 'pcmk_off_action', 'pcmk_reboot_action' instead: vmfence1

Stack: corosync
Current DC: ddc-testwp2 (2) (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Thu Feb  7 19:58:37 2019
Last change: Thu Feb  7 18:30:25 2019 by root via cibadmin on ddc-testwp1

2 nodes configured
8 resources configured

Online: [ ddc-testwp1 (1) ddc-testwp2 (2) ]
Full list of resources:

vmfence1       (stonith:fence_vmware_soap):    Started ddc-testwp2
vmfence2       (stonith:fence_vmware_soap):    Started ddc-testwp1
Clone Set: dlm-clone [dlm]
     dlm        (ocf::pacemaker:controld):      Started ddc-testwp1
     dlm        (ocf::pacemaker:controld):      Started ddc-testwp2
     Started: [ ddc-testwp1 ddc-testwp2 ]
Clone Set: clvmd-clone [clvmd]
     clvmd      (ocf::heartbeat:clvm):  Started ddc-testwp1
     clvmd      (ocf::heartbeat:clvm):  Started ddc-testwp2
     Started: [ ddc-testwp1 ddc-testwp2 ]
Clone Set: wpshared_rsc-clone [wpshared_rsc]
     wpshared_rsc       (ocf::heartbeat:Filesystem):    Stopped
     wpshared_rsc       (ocf::heartbeat:Filesystem):    Stopped
     Stopped: [ ddc-testwp1 ddc-testwp2 ]

Node Attributes:
* Node ddc-testwp1 (1):
* Node ddc-testwp2 (2):

Migration Summary:
* Node ddc-testwp1 (1):
   wpshared_rsc: migration-threshold=1000000 fail-count=1000000 last-failure='Thu Feb  7 19:53:55 2019'
* Node ddc-testwp2 (2):
   wpshared_rsc: migration-threshold=1000000 fail-count=1000000 last-failure='Thu Feb  7 18:06:51 2019'

Failed Actions:
* wpshared_rsc_start_0 on ddc-testwp1 'not installed' (5): call=27, status=complete, exitreason='Couldn't find device [/dev/wpsharedvg/wpsharedlv]. Expected /dev/??? to exist',
    last-rc-change='Thu Feb  7 19:53:55 2019', queued=0ms, exec=96ms
* wpshared_rsc_start_0 on ddc-testwp2 'unknown error' (1): call=30, status=complete, exitreason='Couldn't mount device [/dev/wpsharedvg/wpsharedlv] as /wpshared',
    last-rc-change='Thu Feb  7 18:06:50 2019', queued=0ms, exec=168ms

PCSD Status:
  ddc-testwp1: Online
  ddc-testwp2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Please help!
