Link to home
Create AccountLog in
Avatar of jjoz
jjozFlag for Australia

asked on

Failover cluster failed mysteriously ?

Hi,

I don't know why all in a sudden my Failover cluster failed by itself ?
in the Failover Cluster Management - Cluster Event I received the Critical error message 1135 and 1177

Log Name:      System
Source:        Microsoft-Windows-FailoverClustering
Date:          15/06/2011 9:07:49 PM
Event ID:      1177
Task Category: None
Level:         Critical
Keywords:      
User:          SYSTEM
Computer:      PrintServer01.domain.com
Description:
The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk. 
Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.

Log Name:      System
Source:        Microsoft-Windows-FailoverClustering
Date:          15/06/2011 9:07:28 PM
Event ID:      1135
Task Category: None
Level:         Critical
Keywords:      
User:          SYSTEM
Computer:      PrintServer01.domain.com
Description:
Cluster node 'PrintServer02' was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.

Open in new window

From PrintServer02:


Type :		Error
Date :		15/06/2011
Time :		9:08:09 PM
Event :		1205
Source :		Microsoft-Windows-FailoverClustering
Category :	(3)
User :		\SYSTEM
Computer :	PrintServer02-VM.domain.com
Description:
The description for Event ID ( 1205 ) in Source ( Microsoft-Windows-FailoverClustering ) could not be found. It contains the following insertion string(s): .
PrintServer03

Type :		Error
Date :		15/06/2011
Time :		9:07:45 PM
Event :		1049
Source :		Microsoft-Windows-FailoverClustering
Category :	(20)
User :		\SYSTEM
Computer :	PrintServer02-VM.domain.com
Description:
The description for Event ID ( 1049 ) in Source ( Microsoft-Windows-FailoverClustering ) could not be found. It contains the following insertion string(s): .
IP Address 192.168.127.88 --> the IP address of PrintCluster01.domain.com
192.168.127.88
0

Type :		Error
Date :		15/06/2011
Time :		9:07:29 PM
Event :		4199
Source :		Tcpip
Category :	None
User :		N/A
Computer :	PrintServer02-VM.domain.com
Description:
The description for Event ID ( 4199 ) in Source ( Tcpip ) could not be found. It contains the following insertion string(s): .

192.168.127.142 --> secondary IP of PrintServer01
00-50-56-AE-29-23

Open in new window

From the Primary node:


Type :		Error
Date :		15/06/2011
Time :		9:07:49 PM
Event :		7031
Source :		Service Control Manager
Category :	None
User :		N/A
Computer :	PrintServer01-VM.domain.com
Description:
The description for Event ID ( 7031 ) in Source ( Service Control Manager ) could not be found. It contains the following insertion string(s): .
Cluster Service
1
60000
1
Restart the service

Type :		Error
Date :		15/06/2011
Time :		9:07:49 PM
Event :		7024
Source :		Service Control Manager
Category :	None
User :		N/A
Computer :	PrintServer01-VM.domain.com
Description:
The description for Event ID ( 7024 ) in Source ( Service Control Manager ) could not be found. It contains the following insertion string(s): .
Cluster Service
5925 (0x1725)

Type :		Error
Date :		15/06/2011
Time :		9:07:42 PM
Event :		1069
Source :		Microsoft-Windows-FailoverClustering
Category :	(3)
User :		\SYSTEM
Computer :	PrintServer01-VM.domain.com
Description:
The description for Event ID ( 1069 ) in Source ( Microsoft-Windows-FailoverClustering ) could not be found. It contains the following insertion string(s): .
Quorum
Cluster Group

Type :		Error
Date :		15/06/2011
Time :		9:07:35 PM
Event :		1069
Source :		Microsoft-Windows-FailoverClustering
Category :	(3)
User :		\SYSTEM
Computer :	PrintServer01-VM.domain.com
Description:
The description for Event ID ( 1069 ) in Source ( Microsoft-Windows-FailoverClustering ) could not be found. It contains the following insertion string(s): .
Quorum
Cluster Group

Open in new window

Avatar of Philip Elder
Philip Elder
Flag of Canada image

2 nodes?

Philip
Avatar of jjoz

ASKER

well it is consist of 4 nodes but then it failed from Node 1 into node 2 for some reason.
Yes I am using SAN
Avatar of jjoz

ASKER

yes it is using Physical RDM into the SAN
have you run the best practice analyzer over this cluster and cluster verification utilitys
Avatar of jjoz

ASKER

ok, I found some interesting error here, from the very first critical error message logged in the Event viewer on PrintServer02:

Log Name:      System
Source:        Tcpip
Date:          15/06/2011 9:07:29 PM
Event ID:      4199
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      PrintServer02-VM.domain.com
Description:
The system detected an address conflict for IP address 192.168.127.142 with the system having network hardware address [b]00-50-56-AE-29-23.[/b] Network operations on this system may be disrupted as a result.

192.168.127.142 --> secondary IP of PrintServer01

Open in new window


how come it conflict by one of the PrintServer01 node ? the detailed is as below:

Ethernet adapter Local Area Connection* 8:

   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : Microsoft Failover Cluster Virtual Adapter
   Physical Address. . . . . . . . . : 02-50-56-AE-29-23
   DHCP Enabled. . . . . . . . . . . : No
   Autoconfiguration Enabled . . . . : Yes
   IPv4 Address. . . . . . . . . . . : 169.254.1.183(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.255.0.0
   Default Gateway . . . . . . . . . :
   NetBIOS over Tcpip. . . . . . . . : Enabled

Open in new window

Avatar of jjoz

ASKER

the Cluster is now accessible after I forced it to fail back manually into PrintServer01 but of course that defeats the purpose of having FailoverCluster :-|

FYI: PrintServer02 hold the Quorum disk 98% free disk space.

Where can I find the "best practice analyzer for cluster" ?
Avatar of jjoz

ASKER

ah, I am using 2008 enterprise now, does "Validate" cluster feature is the same as Cluster Best practice analyzer ?

I'm about to run the validate cluster config. but worried if I do this during the business hour disruption. Is that OK to execute ?
ASKER CERTIFIED SOLUTION
Avatar of Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
See answer
No, dont do in hours.

Schedule for out of hours.
Avatar of jjoz

ASKER

many thanks for the reply mate,

however I'm sure that I the IP is static not by DHCP as from the IPCONFIG results below without ip duplicates:

I'm still unclear as of why it is conflict by itself ?
From PrintServer02
Windows IP Configuration

Host Name . . . . . . . . . . . . : PrintServer02
Primary Dns Suffix . . . . . . . : domain.com
Node Type . . . . . . . . . . . . : Hybrid
IP Routing Enabled. . . . . . . . : No
WINS Proxy Enabled. . . . . . . . : No
DNS Suffix Search List. . . . . . : domain.com
domain.com.au

Ethernet adapter Local Area Connection* 8:

Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Microsoft Failover Cluster Virtual Adapter
Physical Address. . . . . . . . . : 02-50-56-AE-5F-E5
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
IPv4 Address. . . . . . . . . . . : 169.254.2.86(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.0.0
Default Gateway . . . . . . . . . :
NetBIOS over Tcpip. . . . . . . . : Enabled

Ethernet adapter Cluster Public Network:

Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Intel® PRO/1000 MT Network Connection
Physical Address. . . . . . . . . : 00-50-56-AE-79-FA
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
IPv4 Address. . . . . . . . . . . : 192.168.127.172(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
IPv4 Address. . . . . . . . . . . : 192.168.127.119(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . : 192.168.127.254
DNS Servers . . . . . . . . . . . : 192.168.127.10
192.168.127.11
Primary WINS Server . . . . . . . : 192.168.127.11
Secondary WINS Server . . . . . . : 192.168.127.10
NetBIOS over Tcpip. . . . . . . . : Enabled

Ethernet adapter Cluster Private Network:

Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Intel® PRO/1000 MT Network Connection #2
Physical Address. . . . . . . . . : 00-50-56-AE-77-8D
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
IPv4 Address. . . . . . . . . . . : 10.184.2.3(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . :
NetBIOS over Tcpip. . . . . . . . : Disabled

Open in new window

From PrintServer01 (the Active Node)
Windows IP Configuration

Host Name . . . . . . . . . . . . : PrintServer01
Primary Dns Suffix . . . . . . . : domain.com
Node Type . . . . . . . . . . . . : Hybrid
IP Routing Enabled. . . . . . . . : No
WINS Proxy Enabled. . . . . . . . : No
DNS Suffix Search List. . . . . . : domain.com
domain.com.au

Ethernet adapter Local Area Connection* 8:

Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Microsoft Failover Cluster Virtual Adapter
Physical Address. . . . . . . . . : 02-50-56-AE-29-23
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
IPv4 Address. . . . . . . . . . . : 169.254.1.183(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.0.0
Default Gateway . . . . . . . . . :
NetBIOS over Tcpip. . . . . . . . : Enabled

Ethernet adapter Cluster Public Network:

Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Intel® PRO/1000 MT Network Connection
Physical Address. . . . . . . . . : 00-50-56-AE-29-23
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
IPv4 Address. . . . . . . . . . . : 192.168.127.155(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
IPv4 Address. . . . . . . . . . . : 192.168.127.88(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
IPv4 Address. . . . . . . . . . . : 192.168.127.142(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
IPv4 Address. . . . . . . . . . . : 192.168.127.143(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
IPv4 Address. . . . . . . . . . . : 192.168.127.144(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . : 192.168.127.254
DNS Servers . . . . . . . . . . . : 192.168.127.10
192.168.127.11
Primary WINS Server . . . . . . . : 192.168.127.10
Secondary WINS Server . . . . . . : 192.168.127.11
NetBIOS over Tcpip. . . . . . . . : Enabled

Ethernet adapter Cluster Private Network:

Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Intel® PRO/1000 MT Network Connection #2
Physical Address. . . . . . . . . : 00-50-56-AE-43-EC
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
IPv4 Address. . . . . . . . . . . : 10.184.2.2(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . :
NetBIOS over Tcpip. . . . . . . . : Disabled

Open in new window

Avatar of jjoz

ASKER

I've executed the test so far it returns all good with green checkboxes the only warning is that I didn't select the disk testing options.