Link to home
Start Free TrialLog in
Avatar of Techie solution
Techie solution

asked on

Windows Cluster 2016 error , Node 1 unavailable after a restart

Windows Cluster 2016 error , Node 1 unavailable after a restart (after installing Patch 1 for SQL cluster.) Pinged the IP and the IP is available , able to RDP to the server. Also all disk are up and running only issue is with NODE1 network its showing unavailable.
Please Advise.
Avatar of lcohan
lcohan
Flag of Canada image

So Node1 was restarted "after installing Patch 1 for SQL cluster" and it came up right?
What "after installing Patch 1 for SQL cluster" is - I mean was that the SQL SP1? or something else?
Was the Cluster running ALL resources on Node2 prior to Patching/restart of Node1?
If you RDP into node2 and start Failover Cluster Manager on it, connect to your Cluster - what are the errors - if any - in last 24hours?
You should right click your clsuter name in it then run a "Validate Cluster..." report that will show you any potential issues with details.
Avatar of Techie solution
Techie solution

ASKER

It was running prior to a reboot. There is an error id of 5398.
In an elevated PowerShell please post the results to a CODE snippet:
Get-NetAdapter
Get-NetIPAddress

Open in new window

PS C:\WINDOWS\system32> Get-NetAdapter

Name                      InterfaceDescription                    ifIndex Status       MacAddress             LinkSpeed
----                      --------------------                    ------- ------       ----------             ---------
Cluster                   Intel(R) 82574L Gigabit Network Co...#2       3 Up           00-50-xx.xx.xx-D0         1 Gbps
Production                Intel(R) 82574L Gigabit Network Conn...       4 Up           00-50-xx.xx.xx-99         1 Gbps



PS C:\WINDOWS\system32> Get-NetIPAddress


IPAddress         : fe80::b1f4:d8a3:a2d5:efe0%3
InterfaceIndex    : 3
InterfaceAlias    : Cluster
AddressFamily     : IPv6
Type              : Unicast
PrefixLength      : 64
PrefixOrigin      : WellKnown
SuffixOrigin      : Link
AddressState      : Preferred
ValidLifetime     : Infinite ([TimeSpan]::MaxValue)
PreferredLifetime : Infinite ([TimeSpan]::MaxValue)
SkipAsSource      : False
PolicyStore       : ActiveStore

IPAddress         : fe80::878:6c1:f868:6c66%2
InterfaceIndex    : 2
InterfaceAlias    : Local Area Connection* 2
AddressFamily     : IPv6
Type              : Unicast
PrefixLength      : 64
PrefixOrigin      : WellKnown
SuffixOrigin      : Link
AddressState      : Deprecated
ValidLifetime     : Infinite ([TimeSpan]::MaxValue)
PreferredLifetime : Infinite ([TimeSpan]::MaxValue)
SkipAsSource      : False
PolicyStore       : ActiveStore

IPAddress         : fe80::f18a:98:3a1c:ff4e%4
InterfaceIndex    : 4
InterfaceAlias    : Production
AddressFamily     : IPv6
Type              : Unicast
PrefixLength      : 64
PrefixOrigin      : WellKnown
SuffixOrigin      : Link
AddressState      : Preferred
ValidLifetime     : Infinite ([TimeSpan]::MaxValue)
PreferredLifetime : Infinite ([TimeSpan]::MaxValue)
SkipAsSource      : False
PolicyStore       : ActiveStore

IPAddress         : ::1
InterfaceIndex    : 1
InterfaceAlias    : Loopback Pseudo-Interface 1
AddressFamily     : IPv6
Type              : Unicast
PrefixLength      : 128
PrefixOrigin      : WellKnown
SuffixOrigin      : WellKnown
AddressState      : Preferred
ValidLifetime     : Infinite ([TimeSpan]::MaxValue)
PreferredLifetime : Infinite ([TimeSpan]::MaxValue)
SkipAsSource      : False
PolicyStore       : ActiveStore

IPAddress         : 192.xx.xx.2
InterfaceIndex    : 3
InterfaceAlias    : Cluster
AddressFamily     : IPv4
Type              : Unicast
PrefixLength      : 24
PrefixOrigin      : Manual
SuffixOrigin      : Manual
AddressState      : Preferred
ValidLifetime     : Infinite ([TimeSpan]::MaxValue)
PreferredLifetime : Infinite ([TimeSpan]::MaxValue)
SkipAsSource      : False
PolicyStore       : ActiveStore

IPAddress         : 169.xx.xx.102
InterfaceIndex    : 2
InterfaceAlias    : Local Area Connection* 2
AddressFamily     : IPv4
Type              : Unicast
PrefixLength      : 16
PrefixOrigin      : WellKnown
SuffixOrigin      : Link
AddressState      : Tentative
ValidLifetime     : Infinite ([TimeSpan]::MaxValue)
PreferredLifetime : Infinite ([TimeSpan]::MaxValue)
SkipAsSource      : False
PolicyStore       : ActiveStore

IPAddress         : 169.xx.xx.228
InterfaceIndex    : 2
InterfaceAlias    : Local Area Connection* 2
AddressFamily     : IPv4
Type              : Unicast
PrefixLength      : 16
PrefixOrigin      : Manual
SuffixOrigin      : Manual
AddressState      : Tentative
ValidLifetime     : Infinite ([TimeSpan]::MaxValue)
PreferredLifetime : Infinite ([TimeSpan]::MaxValue)
SkipAsSource      : False
PolicyStore       : ActiveStore

IPAddress         : 10.xx.xx.161
InterfaceIndex    : 4
InterfaceAlias    : Production
AddressFamily     : IPv4
Type              : Unicast
PrefixLength      : 24
PrefixOrigin      : Manual
SuffixOrigin      : Manual
AddressState      : Preferred
ValidLifetime     : Infinite ([TimeSpan]::MaxValue)
PreferredLifetime : Infinite ([TimeSpan]::MaxValue)
SkipAsSource      : False
PolicyStore       : ActiveStore

IPAddress         : 127.0.0.1
InterfaceIndex    : 1
InterfaceAlias    : Loopback Pseudo-Interface 1
AddressFamily     : IPv4
Type              : Unicast
PrefixLength      : 8
PrefixOrigin      : WellKnown
SuffixOrigin      : WellKnown
AddressState      : Preferred
ValidLifetime     : Infinite ([TimeSpan]::MaxValue)
PreferredLifetime : Infinite ([TimeSpan]::MaxValue)
SkipAsSource      : False
PolicyStore       : ActiveStore
I don't see at least a NetLbfoTeam?

It's preferable to set up a NIC team and bind the virtual switch onto that team and in this case with two ports to allow the host OS to share that team and thus have an IP for the production network.

In a cluster setting DNS A record registration for production should only be the IP of the management adapter.

To do so:
Set-DNSClient -InterfaceAlias Cl* -RegisterThisConnectionsAddress $False

Open in new window


A SET switch (Switch Embedded Teaming) would allow for the creation of two virtual NICs (vNICs) to run on the two subnets needed.
This configuration was not needed initially . It was working fine.
Did you looked at the Failover Cluster Manager messages and tried a "Validate Cluster..." report?
Any guidance please.
There are obvious errors that a Systems/Networking enginner would have to look at and fix before running the validate cluster again and bring it back on line as a "cluster" - hopefully it is all well still running on node2 right? Just make sure the automatic fail over is disabled on it until node1 is brought back into the cluster. Alternatively...as this is a VM - do you have by any chance a snapshot of node1 taken just before the patch was applied? if yes, you could try rollout that snapshot on node1
No we don't have any snapshot. Also can you guide me through how to troubleshoot it.
The log settings we use to troubleshoot are in this blog post: A Microsoft Cluster Troubleshooting Guide

Both FCM and PowerShell are useful to finding the problem.


Message Analyzer can also be tuned to help in a 2012 RTM/R2 and 2016 setting though we've not used it in a long time.
I tried to evict node 1 and when I am trying to rejoin it is throwing below error.
H--My-Pictures-C1.PNG
Should we detroy the cluster and recreate it again with same nodes. If yes what should be the best practice to do this
I have two very thorough EE articles on all things Hyper-V:

Some Hyper-V Hardware and Software Best Practices
Practical Hyper-V Performance Expectations

This is a sample PowerShell script for setting up one a cluster node.

The goal in a cluster setting is to remove as many single points of failure (SPFs) as is possible. NIC teaming is one such method of doing so.
 
 + BIOS and firmware up to date on all nodes prior
 + Install OS
 + Install drivers
 + Set up NetLbfoTeam
 + Install Hyper-V and Cluster Roles
 + Set up clustered storage and CSVs
 + Set up Hyper-V to work with C:\ClusterStorage out of the box for new VMs
 + Set up & bind the virtual switch/SET Switch (prefer not shared with host OS but port count must be >2)
 + Import VMs (assuming there's already workloads on the current cluster)

The above should be a good start.
ASKER CERTIFIED SOLUTION
Avatar of Techie solution
Techie solution

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial