jjoz
asked on
Failover cluster failed mysteriously ?
Hi,
I don't know why all in a sudden my Failover cluster failed by itself ?
in the Failover Cluster Management - Cluster Event I received the Critical error message 1135 and 1177
I don't know why all in a sudden my Failover cluster failed by itself ?
in the Failover Cluster Management - Cluster Event I received the Critical error message 1135 and 1177
Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 15/06/2011 9:07:49 PM
Event ID: 1177
Task Category: None
Level: Critical
Keywords:
User: SYSTEM
Computer: PrintServer01.domain.com
Description:
The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk.
Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.
Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 15/06/2011 9:07:28 PM
Event ID: 1135
Task Category: None
Level: Critical
Keywords:
User: SYSTEM
Computer: PrintServer01.domain.com
Description:
Cluster node 'PrintServer02' was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.
From PrintServer02:
Type : Error
Date : 15/06/2011
Time : 9:08:09 PM
Event : 1205
Source : Microsoft-Windows-FailoverClustering
Category : (3)
User : \SYSTEM
Computer : PrintServer02-VM.domain.com
Description:
The description for Event ID ( 1205 ) in Source ( Microsoft-Windows-FailoverClustering ) could not be found. It contains the following insertion string(s): .
PrintServer03
Type : Error
Date : 15/06/2011
Time : 9:07:45 PM
Event : 1049
Source : Microsoft-Windows-FailoverClustering
Category : (20)
User : \SYSTEM
Computer : PrintServer02-VM.domain.com
Description:
The description for Event ID ( 1049 ) in Source ( Microsoft-Windows-FailoverClustering ) could not be found. It contains the following insertion string(s): .
IP Address 192.168.127.88 --> the IP address of PrintCluster01.domain.com
192.168.127.88
0
Type : Error
Date : 15/06/2011
Time : 9:07:29 PM
Event : 4199
Source : Tcpip
Category : None
User : N/A
Computer : PrintServer02-VM.domain.com
Description:
The description for Event ID ( 4199 ) in Source ( Tcpip ) could not be found. It contains the following insertion string(s): .
192.168.127.142 --> secondary IP of PrintServer01
00-50-56-AE-29-23
From the Primary node:
Type : Error
Date : 15/06/2011
Time : 9:07:49 PM
Event : 7031
Source : Service Control Manager
Category : None
User : N/A
Computer : PrintServer01-VM.domain.com
Description:
The description for Event ID ( 7031 ) in Source ( Service Control Manager ) could not be found. It contains the following insertion string(s): .
Cluster Service
1
60000
1
Restart the service
Type : Error
Date : 15/06/2011
Time : 9:07:49 PM
Event : 7024
Source : Service Control Manager
Category : None
User : N/A
Computer : PrintServer01-VM.domain.com
Description:
The description for Event ID ( 7024 ) in Source ( Service Control Manager ) could not be found. It contains the following insertion string(s): .
Cluster Service
5925 (0x1725)
Type : Error
Date : 15/06/2011
Time : 9:07:42 PM
Event : 1069
Source : Microsoft-Windows-FailoverClustering
Category : (3)
User : \SYSTEM
Computer : PrintServer01-VM.domain.com
Description:
The description for Event ID ( 1069 ) in Source ( Microsoft-Windows-FailoverClustering ) could not be found. It contains the following insertion string(s): .
Quorum
Cluster Group
Type : Error
Date : 15/06/2011
Time : 9:07:35 PM
Event : 1069
Source : Microsoft-Windows-FailoverClustering
Category : (3)
User : \SYSTEM
Computer : PrintServer01-VM.domain.com
Description:
The description for Event ID ( 1069 ) in Source ( Microsoft-Windows-FailoverClustering ) could not be found. It contains the following insertion string(s): .
Quorum
Cluster Group
check storage?
ASKER
well it is consist of 4 nodes but then it failed from Node 1 into node 2 for some reason.
Yes I am using SAN
Yes I am using SAN
it lost access to quorum, Is this on SAN?
ASKER
yes it is using Physical RDM into the SAN
is this still accessible?
have you run the best practice analyzer over this cluster and cluster verification utilitys
ASKER
ok, I found some interesting error here, from the very first critical error message logged in the Event viewer on PrintServer02:
how come it conflict by one of the PrintServer01 node ? the detailed is as below:
Log Name: System
Source: Tcpip
Date: 15/06/2011 9:07:29 PM
Event ID: 4199
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: PrintServer02-VM.domain.com
Description:
The system detected an address conflict for IP address 192.168.127.142 with the system having network hardware address [b]00-50-56-AE-29-23.[/b] Network operations on this system may be disrupted as a result.
192.168.127.142 --> secondary IP of PrintServer01
how come it conflict by one of the PrintServer01 node ? the detailed is as below:
Ethernet adapter Local Area Connection* 8:
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Microsoft Failover Cluster Virtual Adapter
Physical Address. . . . . . . . . : 02-50-56-AE-29-23
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
IPv4 Address. . . . . . . . . . . : 169.254.1.183(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.0.0
Default Gateway . . . . . . . . . :
NetBIOS over Tcpip. . . . . . . . : Enabled
well ip address conflicts are never good
check all ip addresses are unique
ASKER
the Cluster is now accessible after I forced it to fail back manually into PrintServer01 but of course that defeats the purpose of having FailoverCluster :-|
FYI: PrintServer02 hold the Quorum disk 98% free disk space.
Where can I find the "best practice analyzer for cluster" ?
FYI: PrintServer02 hold the Quorum disk 98% free disk space.
Where can I find the "best practice analyzer for cluster" ?
its now included with 2008 R2
ASKER
ah, I am using 2008 enterprise now, does "Validate" cluster feature is the same as Cluster Best practice analyzer ?
I'm about to run the validate cluster config. but worried if I do this during the business hour disruption. Is that OK to execute ?
I'm about to run the validate cluster config. but worried if I do this during the business hour disruption. Is that OK to execute ?
ASKER CERTIFIED SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
No, dont do in hours.
Schedule for out of hours.
Schedule for out of hours.
ASKER
many thanks for the reply mate,
however I'm sure that I the IP is static not by DHCP as from the IPCONFIG results below without ip duplicates:
I'm still unclear as of why it is conflict by itself ?
however I'm sure that I the IP is static not by DHCP as from the IPCONFIG results below without ip duplicates:
I'm still unclear as of why it is conflict by itself ?
From PrintServer02
Windows IP Configuration
Host Name . . . . . . . . . . . . : PrintServer02
Primary Dns Suffix . . . . . . . : domain.com
Node Type . . . . . . . . . . . . : Hybrid
IP Routing Enabled. . . . . . . . : No
WINS Proxy Enabled. . . . . . . . : No
DNS Suffix Search List. . . . . . : domain.com
domain.com.au
Ethernet adapter Local Area Connection* 8:
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Microsoft Failover Cluster Virtual Adapter
Physical Address. . . . . . . . . : 02-50-56-AE-5F-E5
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
IPv4 Address. . . . . . . . . . . : 169.254.2.86(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.0.0
Default Gateway . . . . . . . . . :
NetBIOS over Tcpip. . . . . . . . : Enabled
Ethernet adapter Cluster Public Network:
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Intel® PRO/1000 MT Network Connection
Physical Address. . . . . . . . . : 00-50-56-AE-79-FA
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
IPv4 Address. . . . . . . . . . . : 192.168.127.172(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
IPv4 Address. . . . . . . . . . . : 192.168.127.119(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . : 192.168.127.254
DNS Servers . . . . . . . . . . . : 192.168.127.10
192.168.127.11
Primary WINS Server . . . . . . . : 192.168.127.11
Secondary WINS Server . . . . . . : 192.168.127.10
NetBIOS over Tcpip. . . . . . . . : Enabled
Ethernet adapter Cluster Private Network:
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Intel® PRO/1000 MT Network Connection #2
Physical Address. . . . . . . . . : 00-50-56-AE-77-8D
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
IPv4 Address. . . . . . . . . . . : 10.184.2.3(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . :
NetBIOS over Tcpip. . . . . . . . : Disabled
From PrintServer01 (the Active Node)
Windows IP Configuration
Host Name . . . . . . . . . . . . : PrintServer01
Primary Dns Suffix . . . . . . . : domain.com
Node Type . . . . . . . . . . . . : Hybrid
IP Routing Enabled. . . . . . . . : No
WINS Proxy Enabled. . . . . . . . : No
DNS Suffix Search List. . . . . . : domain.com
domain.com.au
Ethernet adapter Local Area Connection* 8:
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Microsoft Failover Cluster Virtual Adapter
Physical Address. . . . . . . . . : 02-50-56-AE-29-23
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
IPv4 Address. . . . . . . . . . . : 169.254.1.183(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.0.0
Default Gateway . . . . . . . . . :
NetBIOS over Tcpip. . . . . . . . : Enabled
Ethernet adapter Cluster Public Network:
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Intel® PRO/1000 MT Network Connection
Physical Address. . . . . . . . . : 00-50-56-AE-29-23
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
IPv4 Address. . . . . . . . . . . : 192.168.127.155(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
IPv4 Address. . . . . . . . . . . : 192.168.127.88(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
IPv4 Address. . . . . . . . . . . : 192.168.127.142(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
IPv4 Address. . . . . . . . . . . : 192.168.127.143(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
IPv4 Address. . . . . . . . . . . : 192.168.127.144(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . : 192.168.127.254
DNS Servers . . . . . . . . . . . : 192.168.127.10
192.168.127.11
Primary WINS Server . . . . . . . : 192.168.127.10
Secondary WINS Server . . . . . . : 192.168.127.11
NetBIOS over Tcpip. . . . . . . . : Enabled
Ethernet adapter Cluster Private Network:
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Intel® PRO/1000 MT Network Connection #2
Physical Address. . . . . . . . . : 00-50-56-AE-43-EC
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
IPv4 Address. . . . . . . . . . . : 10.184.2.2(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . :
NetBIOS over Tcpip. . . . . . . . : Disabled
ASKER
I've executed the test so far it returns all good with green checkboxes the only warning is that I didn't select the disk testing options.
Philip