• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 196
  • Last Modified:

Hyper-v cluster unstable

Dear Experts,

Need urgent expert suggestion. Our hyper v cluster formed on SMB as storage is going unstable lately for some reason. Vm's just goes missing out of no where, the cluster behaves erratic, Host communication goes down from VMM but actual host will be running

We had this situation twice earlier and somehow recovered but this is recurring again like once in 10 to 15 days.

Please suggest where else to start looking for reason and if there is any trouble shooting guide to iron out the problem

Thanks and looking forward

Best Regards
0
Sri M
Asked:
Sri M
  • 8
  • 5
1 Solution
 
Philip ElderTechnical Architect - HA/Compute/StorageCommented:
Run the Cluster Validation Wizard.

Make sure your SMB Multichannel paths are set up and constrained to the NIC ports between SMB storage and Hyper-V (New-SmbMultichannelConstraint).

This kind of behaviour has been seen before in situations where the SMB paths are not set up correctly especially not direct.
0
 
Sri MCEOAuthor Commented:
Thank you Philip for the quick response

I will do the validation and update here
0
 
Sri MCEOAuthor Commented:
Hello Philip,

I ran cluster validation and also verified the smb interface isolation. Cluster validation was good with few warnings

Can you have a look at attached configuration on one of our node if we need to address any issues? Here one 10 Gig Ethernet card is showing up as 1 Gig due to bad physical cable. We verified teaming and all seems good. We are unable to find root cause of cluster going unstable out of nowhere

Thank you

Regards
SMB-INTERFACE.txt
0
Improve Your Query Performance Tuning

In this FREE six-day email course, you'll learn from Janis Griffin, Database Performance Evangelist. She'll teach 12 steps that you can use to optimize your queries as much as possible and see measurable results in your work. Get started today!

 
Philip ElderTechnical Architect - HA/Compute/StorageCommented:
Please post the results in the following within a CODE
Here

Open in new window

window for each node:

Get-NetAdapter
Get-NetLbfoTeam
Get-VMSwitch
Get-SmbMultichannelConnection
Get-SmbMultiChannelConstraint
Get-VM

What does "hyper v cluster formed on SMB as storage" mean? What's the layout for the setup including network fabric(s) please?
0
 
Sri MCEOAuthor Commented:
Hello Philip

Appreciate you quick response. I got the result and seems we are still in lot of trouble.

We are using a single node smb storage over 10 Gig Ethernet for our cluster storage. We are adding two more nodes in few weeks. We are running this cluster for almost two year with no issues and out of no where all the issues are popping up. Kindly let me know your suggestions

PS C:\Users\administrator.czmedc> Get-NetAdapter

Name                      InterfaceDescription                    ifIndex Status       MacAddress             LinkSpeed
----                      --------------------                    ------- ------       ----------             ---------
VM_Network_Logic switch   Microsoft Network Adapter Multiple...#2      31 Up           3C-A8-2A-22-C8-79         1 Gbps
vEthernet (Management_... Hyper-V Virtual Ethernet Adapter #2          30 Up           3C-A8-2A-22-C8-78        10 Gbps
Ethernet 4                Intel(R) Ethernet Converged Networ...#4      19 Up           A0-36-9F-3E-E9-B0         1 Gbps
Ethernet 3                Intel(R) Ethernet Converged Networ...#3      18 Disconnected A0-36-9F-3E-E9-AE          0 bps
Embedded LOM 1 Port 4     HP Ethernet 1Gb 4-port 331i Adapter #4       17 Up           3C-A8-2A-22-C8-7B         1 Gbps
Embedded LOM 1 Port 3     HP Ethernet 1Gb 4-port 331i Adapter #3       16 Disconnected 3C-A8-2A-22-C8-7A          0 bps
Embedded LOM 1 Port 2     HP Ethernet 1Gb 4-port 331i Adapter #2       15 Up           3C-A8-2A-22-C8-79         1 Gbps
Embedded LOM 1 Port 1     HP Ethernet 1Gb 4-port 331i Adapter          14 Up           3C-A8-2A-22-C8-78         1 Gbps
Ethernet                  Intel(R) Ethernet Converged Network ...      12 Disconnected A0-36-9F-3E-EA-28          0 bps
Ethernet 2                Intel(R) Ethernet Converged Networ...#2      13 Up           A0-36-9F-3E-EA-26        10 Gbps
storage_team              Microsoft Network Adapter Multiplexo...      28 Up           A0-36-9F-3E-EA-26        11 Gbps


PS C:\Users\administrator.czmedc> Get-Netlbfoteam

Name                   : storage_team
Members                : {Ethernet 2, Ethernet, Ethernet 3, Ethernet 4}
TeamNics               : storage_team
TeamingMode            : SwitchIndependent
LoadBalancingAlgorithm : Dynamic
Status                 : Degraded

Name                   : VM_Network_Logic switch
Members                : {Embedded LOM 1 Port 3, Embedded LOM 1 Port 2}
TeamNics               : VM_Network_Logic switch
TeamingMode            : SwitchIndependent
LoadBalancingAlgorithm : HyperVPort
Status                 : Degraded

PS C:\Users\administrator.czmedc> Get-VMSwitch

Name                    SwitchType NetAdapterInterfaceDescription
----                    ---------- ------------------------------
Management_Network      External   HP Ethernet 1Gb 4-port 331i Adapter
VM_Network_Logic switch External   Microsoft Network Adapter Multiplexor Driver #2

PS C:\Users\administrator.czmedc> Get-SmbMultichannelConnection

Server Name    Selected       Client IP      Server IP      Client         Server         Client RSS     Client RDMA
                                                            Interface      Interface      Capable        Capable
                                                            Index          Index
-----------    --------       ---------      ---------      -------------- -------------- -------------- --------------
czme-storag... True           10.0.1.9       10.0.1.4       28             24             True           False


PS C:\Users\administrator.czmedc> Get-SmbMultiChannelConstraint

No result

PS C:\Users\administrator.czmedc> Get-VM

Name                             State           CPUUsage(%) MemoryAssigned(M) Uptime     Status
----                             -----           ----------- ----------------- ------     ------
Backup-Server                    Off             0           0                 00:00:00   Operating normally
FilmFactory-WebServer            Running         0           4096              1.20:57:18 Operating normally
GDH-CloudServer(AD)-1            Running         0           2048              1.20:20:22 Operating normally
GDH-CloudServer(Backup)-3        Running         0           5628              1.15:22:31 Operating normally
GDH-CloudServer(WSUS)-2          Running         0           2048              1.20:06:54 Operating normally
GDH-POC Abhishek                 Running         0           8192              1.21:24:24 Operating normally
GDH-POC-Abhishek-Vm2             RunningCritical 0           8192              2.18:41:55 Cannot connect to virtual machine configuration storage
GDH-TM5                          RunningCritical 0           4096              2.18:47:24 Cannot connect to virtual machine configuration storage
Gulf-Packagings-VM1              RunningCritical 0           3968              2.18:48:06 Cannot connect to virtual machine configuration storage
Priamary-DNS                     RunningCritical 0           2048              2.18:40:50 Cannot connect to virtual machine configuration storage
Secondary-DNS                    Running         0           2048              1.20:37:15 Operating normally
Srikanth-WHMC-Automation-Centos7 RunningCritical 0           2046              2.18:49:00 Cannot connect to virtual machine configuration storage
TecomGroup-VM1                   RunningCritical 6           4096              2.09:02:58 Cannot connect to virtual machine configuration storage
Test VM                          Off             0           0                 00:00:00   Operating normally

Open in new window


Thanks

Best Regards
0
 
Sri MCEOAuthor Commented:
Hello Philip

Below is from another node

PS C:\Users\administrator.czme> Get-NetAdapter

Name                      InterfaceDescription                    ifIndex Status       MacAddress             LinkSpeed
----                      --------------------                    ------- ------       ----------             ---------
Slot 02 Port 2            Intel(R) Ethernet Converged Networ...#4      23 Up           A0-36-9F-3E-E9-A8        10 Gbps
Slot 02 Port 1            Intel(R) Ethernet Converged Networ...#3      22 Disconnected A0-36-9F-3E-E9-A6          0 bps
vEthernet (Management_... Hyper-V Virtual Ethernet Adapter #2          41 Up           A0-36-9F-A2-23-00        10 Gbps
Slot 03 Port 2            Intel(R) Ethernet Converged Networ...#2      21 Disconnected A0-36-9F-3E-E9-D8          0 bps
Slot 03 Port 1            Intel(R) Ethernet Converged Network ...      20 Up           A0-36-9F-3E-E9-D6        10 Gbps
Slot 07 Port 4            Intel(R) Ethernet Server Adapter I...#4      17 Disconnected A0-36-9F-A2-23-03          0 bps
Slot 07 Port 3            Intel(R) Ethernet Server Adapter I...#2      14 Disconnected A0-36-9F-A2-23-02          0 bps
Slot 07 Port 2            Intel(R) Ethernet Server Adapter I...#3      15 Up           A0-36-9F-A2-23-01         1 Gbps
Slot 07 Port 1            Intel(R) Ethernet Server Adapter I35...      12 Up           A0-36-9F-A2-23-00         1 Gbps
storage_team              Microsoft Network Adapter Multiple...#3      33 Up           A0-36-9F-3E-E9-D6        20 Gbps
Onboard CNA Port 4        Emulex OneConnect OCe14000  10Gb E...#4      19 Disabled     90-1B-0E-AC-AA-33         1 Gbps
Onboard CNA Port 3        Emulex OneConnect OCe14000  10Gb E...#2      16 Up           90-1B-0E-AC-AA-32         1 Gbps
Onboard CNA Port 2        Emulex OneConnect OCe14000  10Gb Eth...      13 Up           90-1B-0E-AC-AA-31         1 Gbps
Onboard CNA Port 1        Emulex OneConnect OCe14000  10Gb E...#3      18 Up           90-1B-0E-AC-AA-30         1 Gbps
VM_Network_Logic switch   Microsoft Network Adapter Multiple...#2      35 Up           90-1B-0E-AC-AA-31         2 Gbps


PS C:\Users\administrator.czme> Get-NetLBFOTeam


Name                   : VM_Network_Logic switch
Members                : {Slot 07 Port 2, Onboard CNA Port 2}
TeamNics               : VM_Network_Logic switch
TeamingMode            : SwitchIndependent
LoadBalancingAlgorithm : HyperVPort
Status                 : Up

Name                   : storage_team
Members                : {Slot 03 Port 1, Slot 02 Port 1, Slot 02 Port 2, Slot 03 Port 2}
TeamNics               : storage_team
TeamingMode            : SwitchIndependent
LoadBalancingAlgorithm : Dynamic
Status                 : Degraded


PS C:\Users\administrator.czme> Get-VMSwitch

Name                    SwitchType NetAdapterInterfaceDescription
----                    ---------- ------------------------------
VM_Network_Logic switch External   Microsoft Network Adapter Multiplexor Driver #2
Management_Network      External   Intel(R) Ethernet Server Adapter I350-T4

PS C:\Users\administrator.czme> Get-SMBMultiChannelConnection

Server Name         Selected            Client IP           Server IP           Client Interface    Server Interface    Client RSS Capable  Client RDMA Capable
                                                                                Index               Index
-----------         --------            ---------           ---------           ------------------- ------------------- ------------------  -------------------
czme-storage1.CL... True                10.0.1.6            10.0.1.4            33                  24                  True                False


PS C:\Users\administrator.czme> Get-SMBMultiChannelConstraint

PS C:\Users\administrator.czme> Get-VM

Name                           State   CPUUsage(%) MemoryAssigned(M) Uptime     Status
----                           -----   ----------- ----------------- ------     ------
BB-AppServer1                  Running 7           2524              1.22:47:52 Operating normally
BB-AppServer3                  Running 1           4096              1.22:48:21 Operating normally
BB-AppServer4                  Running 0           2048              1.22:47:27 Operating normally
BB-DBServer1                   Running 4           6144              1.22:46:54 Operating normally
Czme-DomainController      Running 0           2048              2.01:42:05 Operating normally
CZME-MailServer1               Running 0           2796              2.01:27:00 Operating normally
FMICS-Windows2012R2-VM2      Running 0           32768             2.01:28:14 Operating normally
FMICS-Windows2012R2-VM1      Running 0           32768             2.01:27:28 Operating normally
SOLUTIONS_POC_Server1 Running 0           4096              1.15:27:49 Operating normally
SOLUTIONS_POC_Server2 Running 0           2048              1.15:28:22 Operating normally
Cloud Server-4 (OSSEC)     Running 0           2048              1.21:27:46 Operating normally
TM6                        Running 0           2048              2.01:34:54 Operating normally
Win2012R2_VM2              Running 0           4096              2.01:26:30 Operating normally
PL_Firewall-New         Running 0           6064              1.15:31:48 Operating normally
PL-New-Firewall         Running 1           8192              1.17:46:09 Operating normally
PL-WinVM1               Running 1           10240             2.01:02:28 Operating normally
PL-WinVM2               Running 0           10240             2.00:58:55 Operating normally
PL-WinVm3 DBServer      Running 1           65536             2.00:52:43 Operating normally
PL-WinVM4               Running 3           10240             1.16:37:24 Operating normally
PL-WinVM5               Running 0           10240             2.00:45:57 Operating normally
PL-WinVM-6 2008         Running 0           10240             1.16:30:10 Operating normally
PL-Test VM                     Running 0           4096              1.21:38:39 Operating normally
Tmk-VM1                      Running 0           2048              2.01:28:48 Operating normally

Open in new window

0
 
Philip ElderTechnical Architect - HA/Compute/StorageCommented:
Are the above results from the Hyper -V nodes? Or is the bottom one from storage?
0
 
Sri MCEOAuthor Commented:
Both are from HyperV Nodes? Should I provide you the output from Storage as well?

Regards
0
 
Philip ElderTechnical Architect - HA/Compute/StorageCommented:
I'm a bit confused. The outputs from the Hyper-V nodes are very dissimilar? Why is that?

Normally, we configure our Hyper-V hosts identically for same generation and with identical series of components for the next generation configuration.
0
 
Sri MCEOAuthor Commented:
Hi Philip,

Thank you for your response

Allow me to explain

All hyperv nodes are of same configuration except the make is different. Each node has a two dual port 10 Gbe card, two quad port 1 Gbe cards. All the first ports in 10 Gbe card are teamed for hardware failure redundancy. In quad port first port is used for Management network (Not teamed ) and second port in both quad port 1 Gbe cards are used for vlan traffic (teamed again for redundancy)

This is one of our old cluster running for almost two years now without a problem but lately things started getting unstable. We have a new 2016 cluster formed and they are as per recommendations from Microsoft Hosts are identically of same generation along with components

Let me know if you need further information

Regards
0
 
Philip ElderTechnical Architect - HA/Compute/StorageCommented:
Please have a look at my EE article: Some Hyper-V Hardware and Software Best Practices.

The host should only have one IP address via port 0 on the two Gigabit NICs.

One node has Emulex while the other does not?

What changes were made at the time things became unstable? Is there a change control list? Back step through that change log to find what caused the instability.
0
 
Sri MCEOAuthor Commented:
Hello Philip

Thank you for your valuable inputs. I did understand, we will have review on this setup and fix the differences. I will revert back to you if that doesn't fix the unstable issue.

Regarding reverting back the change its a bit hard to say we did many changes like updates, upgrade of node hardware at one go. Reverting all of them would be risky for us now however we are planning change the configuration of all nodes to similar ones to see how it goes from here.

Since this might take long time I will close this question.

I really appreciate your time in this regard.

Have a good day

Best Regards.
0
 
Sri MCEOAuthor Commented:
Thank you and the information shared is really helpful
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Improve Your Query Performance Tuning

In this FREE six-day email course, you'll learn from Janis Griffin, Database Performance Evangelist. She'll teach 12 steps that you can use to optimize your queries as much as possible and see measurable results in your work. Get started today!

  • 8
  • 5
Tackle projects and never again get stuck behind a technical roadblock.
Join Now