asked on

Windows Cluster 2016 error , Node 1 unavailable after a restart

Windows Cluster 2016 error , Node 1 unavailable after a restart (after installing Patch 1 for SQL cluster.) Pinged the IP and the IP is available , able to RDP to the server. Also all disk are up and running only issue is with NODE1 network its showing unavailable.
Please Advise.

lcohan

So Node1 was restarted "after installing Patch 1 for SQL cluster" and it came up right?
What "after installing Patch 1 for SQL cluster" is - I mean was that the SQL SP1? or something else?
Was the Cluster running ALL resources on Node2 prior to Patching/restart of Node1?
If you RDP into node2 and start Failover Cluster Manager on it, connect to your Cluster - what are the errors - if any - in last 24hours?
You should right click your clsuter name in it then run a "Validate Cluster..." report that will show you any potential issues with details.

Techie solution

ASKER

It was running prior to a reboot. There is an error id of 5398.

Philip Elder

In an elevated PowerShell please post the results to a CODE snippet:

Get-NetAdapter
Get-NetIPAddress

Open in new window

Techie solution

ASKER

PS C:\WINDOWS\system32> Get-NetAdapter

Name InterfaceDescription ifIndex Status MacAddress LinkSpeed
---- -------------------- ------- ------ ---------- ---------
Cluster Intel(R) 82574L Gigabit Network Co...#2 3 Up 00-50-xx.xx.xx-D0 1 Gbps
Production Intel(R) 82574L Gigabit Network Conn... 4 Up 00-50-xx.xx.xx-99 1 Gbps

PS C:\WINDOWS\system32> Get-NetIPAddress

IPAddress : fe80::b1f4:d8a3:a2d5:efe0%3
InterfaceIndex : 3
InterfaceAlias : Cluster
AddressFamily : IPv6
Type : Unicast
PrefixLength : 64
PrefixOrigin : WellKnown
SuffixOrigin : Link
AddressState : Preferred
ValidLifetime : Infinite ([TimeSpan]::MaxValue)
PreferredLifetime : Infinite ([TimeSpan]::MaxValue)
SkipAsSource : False
PolicyStore : ActiveStore

IPAddress : fe80::878:6c1:f868:6c66%2
InterfaceIndex : 2
InterfaceAlias : Local Area Connection* 2
AddressFamily : IPv6
Type : Unicast
PrefixLength : 64
PrefixOrigin : WellKnown
SuffixOrigin : Link
AddressState : Deprecated
ValidLifetime : Infinite ([TimeSpan]::MaxValue)
PreferredLifetime : Infinite ([TimeSpan]::MaxValue)
SkipAsSource : False
PolicyStore : ActiveStore

IPAddress : fe80::f18a:98:3a1c:ff4e%4
InterfaceIndex : 4
InterfaceAlias : Production
AddressFamily : IPv6
Type : Unicast
PrefixLength : 64
PrefixOrigin : WellKnown
SuffixOrigin : Link
AddressState : Preferred
ValidLifetime : Infinite ([TimeSpan]::MaxValue)
PreferredLifetime : Infinite ([TimeSpan]::MaxValue)
SkipAsSource : False
PolicyStore : ActiveStore

IPAddress : ::1
InterfaceIndex : 1
InterfaceAlias : Loopback Pseudo-Interface 1
AddressFamily : IPv6
Type : Unicast
PrefixLength : 128
PrefixOrigin : WellKnown
SuffixOrigin : WellKnown
AddressState : Preferred
ValidLifetime : Infinite ([TimeSpan]::MaxValue)
PreferredLifetime : Infinite ([TimeSpan]::MaxValue)
SkipAsSource : False
PolicyStore : ActiveStore

IPAddress : 192.xx.xx.2
InterfaceIndex : 3
InterfaceAlias : Cluster
AddressFamily : IPv4
Type : Unicast
PrefixLength : 24
PrefixOrigin : Manual
SuffixOrigin : Manual
AddressState : Preferred
ValidLifetime : Infinite ([TimeSpan]::MaxValue)
PreferredLifetime : Infinite ([TimeSpan]::MaxValue)
SkipAsSource : False
PolicyStore : ActiveStore

IPAddress : 169.xx.xx.102
InterfaceIndex : 2
InterfaceAlias : Local Area Connection* 2
AddressFamily : IPv4
Type : Unicast
PrefixLength : 16
PrefixOrigin : WellKnown
SuffixOrigin : Link
AddressState : Tentative
ValidLifetime : Infinite ([TimeSpan]::MaxValue)
PreferredLifetime : Infinite ([TimeSpan]::MaxValue)
SkipAsSource : False
PolicyStore : ActiveStore

IPAddress : 169.xx.xx.228
InterfaceIndex : 2
InterfaceAlias : Local Area Connection* 2
AddressFamily : IPv4
Type : Unicast
PrefixLength : 16
PrefixOrigin : Manual
SuffixOrigin : Manual
AddressState : Tentative
ValidLifetime : Infinite ([TimeSpan]::MaxValue)
PreferredLifetime : Infinite ([TimeSpan]::MaxValue)
SkipAsSource : False
PolicyStore : ActiveStore

IPAddress : 10.xx.xx.161
InterfaceIndex : 4
InterfaceAlias : Production
AddressFamily : IPv4
Type : Unicast
PrefixLength : 24
PrefixOrigin : Manual
SuffixOrigin : Manual
AddressState : Preferred
ValidLifetime : Infinite ([TimeSpan]::MaxValue)
PreferredLifetime : Infinite ([TimeSpan]::MaxValue)
SkipAsSource : False
PolicyStore : ActiveStore

IPAddress : 127.0.0.1
InterfaceIndex : 1
InterfaceAlias : Loopback Pseudo-Interface 1
AddressFamily : IPv4
Type : Unicast
PrefixLength : 8
PrefixOrigin : WellKnown
SuffixOrigin : WellKnown
AddressState : Preferred
ValidLifetime : Infinite ([TimeSpan]::MaxValue)
PreferredLifetime : Infinite ([TimeSpan]::MaxValue)
SkipAsSource : False
PolicyStore : ActiveStore

Philip Elder

I don't see at least a NetLbfoTeam?

It's preferable to set up a NIC team and bind the virtual switch onto that team and in this case with two ports to allow the host OS to share that team and thus have an IP for the production network.

In a cluster setting DNS A record registration for production should only be the IP of the management adapter.

To do so:

Set-DNSClient -InterfaceAlias Cl* -RegisterThisConnectionsAddress $False

Open in new window

A SET switch (Switch Embedded Teaming) would allow for the creation of two virtual NICs (vNICs) to run on the two subnets needed.

Techie solution

ASKER

This configuration was not needed initially . It was working fine.

lcohan

Did you looked at the Failover Cluster Manager messages and tried a "Validate Cluster..." report?

Techie solution

ASKER

H--My-Pictures-12.png

Techie solution

ASKER

Any guidance please.

lcohan

There are obvious errors that a Systems/Networking enginner would have to look at and fix before running the validate cluster again and bring it back on line as a "cluster" - hopefully it is all well still running on node2 right? Just make sure the automatic fail over is disabled on it until node1 is brought back into the cluster. Alternatively...as this is a VM - do you have by any chance a snapshot of node1 taken just before the patch was applied? if yes, you could try rollout that snapshot on node1

Techie solution

ASKER

No we don't have any snapshot. Also can you guide me through how to troubleshoot it.

Philip Elder

The log settings we use to troubleshoot are in this blog post: A Microsoft Cluster Troubleshooting Guide

Both FCM and PowerShell are useful to finding the problem.

Message Analyzer can also be tuned to help in a 2012 RTM/R2 and 2016 setting though we've not used it in a long time.

Techie solution

ASKER

I tried to evict node 1 and when I am trying to rejoin it is throwing below error.
H--My-Pictures-C1.PNG

Techie solution

ASKER

Should we detroy the cluster and recreate it again with same nodes. If yes what should be the best practice to do this

Philip Elder

I have two very thorough EE articles on all things Hyper-V:

Some Hyper-V Hardware and Software Best Practices
Practical Hyper-V Performance Expectations

This is a sample PowerShell script for setting up one a cluster node.

The goal in a cluster setting is to remove as many single points of failure (SPFs) as is possible. NIC teaming is one such method of doing so.

+ BIOS and firmware up to date on all nodes prior
+ Install OS
+ Install drivers
+ Set up NetLbfoTeam
+ Install Hyper-V and Cluster Roles
+ Set up clustered storage and CSVs
+ Set up Hyper-V to work with C:\ClusterStorage out of the box for new VMs
+ Set up & bind the virtual switch/SET Switch (prefer not shared with host OS but port count must be >2)
+ Import VMs (assuming there's already workloads on the current cluster)

The above should be a good start.

ASKER CERTIFIED SOLUTION

Techie solution

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial