vinhbt
asked on
How can i fix following HACMP error ?
Dear all !
I have created the two node cluster on AIX (configuration file is attached ). Cluster has one resource group including 1 Service IP label and 1 volume group.
The status of the cluster as follow
bash-3.00# /usr/es/sbin/cluster/utili ties/cldum p
Obtaining information via SNMP from Node: node1...
__________________________ __________ __________ __________ __________ __________ _
Cluster Name: crmcluster
Cluster State: UP
Cluster Substate: STABLE
__________________________ __________ __________ __________ __________ __________ _
Node Name: node1 State: UP
Network Name: net_diskhb_01 State: UP
Address: Label: hdisk6 State: UP
Network Name: net_ether_01 State: UP
Address: 10.149.34.38 Label: vnpcrm05 State: UP
Network Name: net_ether_02 State: UP
Address: 1.1.1.1 Label: node1-boot State: UP
Address: 10.149.34.44 Label: vnpcluscrm State: UP
Node Name: node2 State: UP
Network Name: net_diskhb_01 State: DOWN
Network Name: net_ether_01 State: UP
Address: 10.149.34.39 Label: vnpcrm06 State: UP
Network Name: net_ether_02 State: UP
Address: 1.1.1.2 Label: node2-boot State: UP
Cluster Name: crmcluster
Resource Group Name: test
Startup Policy: Online On First Available Node
Fallover Policy: Fallover To Next Priority Node In The List
Fallback Policy: Fallback To Higher Priority Node In The List
Site Policy: ignore
Primary instance(s):
The following node temporarily has the highest priority for this instance:
node1, user-requested rg_move performed on Wed Apr 8 16:42:18 2009
Node Group State
-------------------------- -- ---------------
node1 ONLINE
node2 OFFLINE
When I excute fallover test by using Shutdown Fr on one node, or using Moving resource group function on HACMP, it has done. But if I unplug the ethernet cable or the fibre cable, nothing happens.
Using Cluster Test Tool, I encouter some errors (some tests failed)
|| Test 1 Complete - NODE_UP: Start cluster services on all available nodes
||
08/04/2009_14:43:37: || Test Completion Status: NOT RATIONAL
|| Test 2 Complete - NETWORK_UP_GLOBAL: Bring up network1 globally
||
08/04/2009_14:52:23: || Test Completion Status: FAILED
||
|| Test 1 Complete - NETWORK_DOWN_GLOBAL: Bring down non-ip network
||
08/04/2009_14:54:01: || Test Completion Status: FATAL
How can i fix these errors?
cluster040909haw.doc
I have created the two node cluster on AIX (configuration file is attached ). Cluster has one resource group including 1 Service IP label and 1 volume group.
The status of the cluster as follow
bash-3.00# /usr/es/sbin/cluster/utili
Obtaining information via SNMP from Node: node1...
__________________________
Cluster Name: crmcluster
Cluster State: UP
Cluster Substate: STABLE
__________________________
Node Name: node1 State: UP
Network Name: net_diskhb_01 State: UP
Address: Label: hdisk6 State: UP
Network Name: net_ether_01 State: UP
Address: 10.149.34.38 Label: vnpcrm05 State: UP
Network Name: net_ether_02 State: UP
Address: 1.1.1.1 Label: node1-boot State: UP
Address: 10.149.34.44 Label: vnpcluscrm State: UP
Node Name: node2 State: UP
Network Name: net_diskhb_01 State: DOWN
Network Name: net_ether_01 State: UP
Address: 10.149.34.39 Label: vnpcrm06 State: UP
Network Name: net_ether_02 State: UP
Address: 1.1.1.2 Label: node2-boot State: UP
Cluster Name: crmcluster
Resource Group Name: test
Startup Policy: Online On First Available Node
Fallover Policy: Fallover To Next Priority Node In The List
Fallback Policy: Fallback To Higher Priority Node In The List
Site Policy: ignore
Primary instance(s):
The following node temporarily has the highest priority for this instance:
node1, user-requested rg_move performed on Wed Apr 8 16:42:18 2009
Node Group State
--------------------------
node1 ONLINE
node2 OFFLINE
When I excute fallover test by using Shutdown Fr on one node, or using Moving resource group function on HACMP, it has done. But if I unplug the ethernet cable or the fibre cable, nothing happens.
Using Cluster Test Tool, I encouter some errors (some tests failed)
|| Test 1 Complete - NODE_UP: Start cluster services on all available nodes
||
08/04/2009_14:43:37: || Test Completion Status: NOT RATIONAL
|| Test 2 Complete - NETWORK_UP_GLOBAL: Bring up network1 globally
||
08/04/2009_14:52:23: || Test Completion Status: FAILED
||
|| Test 1 Complete - NETWORK_DOWN_GLOBAL: Bring down non-ip network
||
08/04/2009_14:54:01: || Test Completion Status: FATAL
How can i fix these errors?
cluster040909haw.doc
ASKER
Hi woolmilkporc:
I have reconfigured HACMP with following change:
/etc/hosts
################HACMP IP ADDRESS################### #######
192.168.1.1 vnpcrm05_boot
10.0.0.1 vnpcrm05_standby
10.149.34.38 vnpcrm05
192.168.1.2 vnpcrm06_boot
10.0.0.2 vnpcrm06_standby
10.149.34.39 vnpcrm06
10.149.34.44 vnpcrm_virtual
# ifconfig -a
en0: flags=5e080863,c0<UP,BROAD CAST,NOTRA ILERS,RUNN ING,SIMPLE X,MULTICAS T,GROUPRT, 64BIT,CHEC KSUM_OFFLO AD(ACTIVE) ,PSEG,LARG ESEND,CHAI N>
inet 192.168.1.1 netmask 0xffffffe0 broadcast 192.168.1.31
tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0
en1: flags=5e080863,c0<UP,BROAD CAST,NOTRA ILERS,RUNN ING,SIMPLE X,MULTICAS T,GROUPRT, 64BIT,CHEC KSUM_OFFLO AD(ACTIVE) ,PSEG,LARG ESEND,CHAI N>
inet 10.0.0.1 netmask 0xffffffe0 broadcast 10.0.0.31
inet 10.149.34.38 netmask 0xffffff00 broadcast 10.149.34.255
tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0
lo0: flags=e08084b<UP,BROADCAST ,LOOPBACK, RUNNING,SI MPLEX,MULT ICAST,GROU PRT,64BIT>
inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
inet6 ::1/0
tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1
I also exported the configuration to the attach file (crmcluster.haw)
I still have the same problem. HACMP can not fail-over when I unplug the ethernet cable, unplug the fibre cable&
Running the cluster testtool, I get the log file (cl_testtool)
I excuted the command lssrc ls topsvcs and the output is
below:
lssrc -ls topsvcs
0513-036 The request could not be passed to the topsvcs subsystem.
Start the subsystem and try your command again.
Do you have any ideal ?
cl-testtool.log
crmcluster.haw.txt
I have reconfigured HACMP with following change:
/etc/hosts
################HACMP IP ADDRESS###################
192.168.1.1 vnpcrm05_boot
10.0.0.1 vnpcrm05_standby
10.149.34.38 vnpcrm05
192.168.1.2 vnpcrm06_boot
10.0.0.2 vnpcrm06_standby
10.149.34.39 vnpcrm06
10.149.34.44 vnpcrm_virtual
# ifconfig -a
en0: flags=5e080863,c0<UP,BROAD
inet 192.168.1.1 netmask 0xffffffe0 broadcast 192.168.1.31
tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0
en1: flags=5e080863,c0<UP,BROAD
inet 10.0.0.1 netmask 0xffffffe0 broadcast 10.0.0.31
inet 10.149.34.38 netmask 0xffffff00 broadcast 10.149.34.255
tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0
lo0: flags=e08084b<UP,BROADCAST
inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
inet6 ::1/0
tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1
I also exported the configuration to the attach file (crmcluster.haw)
I still have the same problem. HACMP can not fail-over when I unplug the ethernet cable, unplug the fibre cable&
Running the cluster testtool, I get the log file (cl_testtool)
I excuted the command lssrc ls topsvcs and the output is
below:
lssrc -ls topsvcs
0513-036 The request could not be passed to the topsvcs subsystem.
Start the subsystem and try your command again.
Do you have any ideal ?
cl-testtool.log
crmcluster.haw.txt
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
there seems to be a problem with your disk heartbeat network net_diskhb_01.
Consider deleting and redefining it. When selecting hdisks, carefully compare the PVids (not the hdiskn names) to be sure to have the same disk defined at both ends. Best let HACMP discover the resources, then select the appropriate devices.
Additionally, please post the output of
lssrc -ls topsvcs
... and please explain the purpose of the IP interface 10.149.34.44 (vnpcluscrm). Seems a bit strange that it is a member of the same network as 1.1.1.1, What are your netmask settings?
wmp