Tech or Treat! Write an article about your scariest tech disaster to win gadgets!Learn more

x
?
Solved

Hyper-v cluster unstable

Posted on 2017-05-29
13
Medium Priority
?
123 Views
Last Modified: 2017-06-03
Dear Experts,

Need urgent expert suggestion. Our hyper v cluster formed on SMB as storage is going unstable lately for some reason. Vm's just goes missing out of no where, the cluster behaves erratic, Host communication goes down from VMM but actual host will be running

We had this situation twice earlier and somehow recovered but this is recurring again like once in 10 to 15 days.

Please suggest where else to start looking for reason and if there is any trouble shooting guide to iron out the problem

Thanks and looking forward

Best Regards
0
Comment
Question by:Sri M
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 8
  • 5
13 Comments
 
LVL 39

Expert Comment

by:Philip Elder
ID: 42154921
Run the Cluster Validation Wizard.

Make sure your SMB Multichannel paths are set up and constrained to the NIC ports between SMB storage and Hyper-V (New-SmbMultichannelConstraint).

This kind of behaviour has been seen before in situations where the SMB paths are not set up correctly especially not direct.
0
 

Author Comment

by:Sri M
ID: 42155050
Thank you Philip for the quick response

I will do the validation and update here
0
 

Author Comment

by:Sri M
ID: 42155200
Hello Philip,

I ran cluster validation and also verified the smb interface isolation. Cluster validation was good with few warnings

Can you have a look at attached configuration on one of our node if we need to address any issues? Here one 10 Gig Ethernet card is showing up as 1 Gig due to bad physical cable. We verified teaming and all seems good. We are unable to find root cause of cluster going unstable out of nowhere

Thank you

Regards
SMB-INTERFACE.txt
0
Microsoft Certification Exam 74-409

Veeam® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.

 
LVL 39

Expert Comment

by:Philip Elder
ID: 42157340
Please post the results in the following within a CODE
Here

Open in new window

window for each node:

Get-NetAdapter
Get-NetLbfoTeam
Get-VMSwitch
Get-SmbMultichannelConnection
Get-SmbMultiChannelConstraint
Get-VM

What does "hyper v cluster formed on SMB as storage" mean? What's the layout for the setup including network fabric(s) please?
0
 

Author Comment

by:Sri M
ID: 42158128
Hello Philip

Appreciate you quick response. I got the result and seems we are still in lot of trouble.

We are using a single node smb storage over 10 Gig Ethernet for our cluster storage. We are adding two more nodes in few weeks. We are running this cluster for almost two year with no issues and out of no where all the issues are popping up. Kindly let me know your suggestions

PS C:\Users\administrator.czmedc> Get-NetAdapter

Name                      InterfaceDescription                    ifIndex Status       MacAddress             LinkSpeed
----                      --------------------                    ------- ------       ----------             ---------
VM_Network_Logic switch   Microsoft Network Adapter Multiple...#2      31 Up           3C-A8-2A-22-C8-79         1 Gbps
vEthernet (Management_... Hyper-V Virtual Ethernet Adapter #2          30 Up           3C-A8-2A-22-C8-78        10 Gbps
Ethernet 4                Intel(R) Ethernet Converged Networ...#4      19 Up           A0-36-9F-3E-E9-B0         1 Gbps
Ethernet 3                Intel(R) Ethernet Converged Networ...#3      18 Disconnected A0-36-9F-3E-E9-AE          0 bps
Embedded LOM 1 Port 4     HP Ethernet 1Gb 4-port 331i Adapter #4       17 Up           3C-A8-2A-22-C8-7B         1 Gbps
Embedded LOM 1 Port 3     HP Ethernet 1Gb 4-port 331i Adapter #3       16 Disconnected 3C-A8-2A-22-C8-7A          0 bps
Embedded LOM 1 Port 2     HP Ethernet 1Gb 4-port 331i Adapter #2       15 Up           3C-A8-2A-22-C8-79         1 Gbps
Embedded LOM 1 Port 1     HP Ethernet 1Gb 4-port 331i Adapter          14 Up           3C-A8-2A-22-C8-78         1 Gbps
Ethernet                  Intel(R) Ethernet Converged Network ...      12 Disconnected A0-36-9F-3E-EA-28          0 bps
Ethernet 2                Intel(R) Ethernet Converged Networ...#2      13 Up           A0-36-9F-3E-EA-26        10 Gbps
storage_team              Microsoft Network Adapter Multiplexo...      28 Up           A0-36-9F-3E-EA-26        11 Gbps


PS C:\Users\administrator.czmedc> Get-Netlbfoteam

Name                   : storage_team
Members                : {Ethernet 2, Ethernet, Ethernet 3, Ethernet 4}
TeamNics               : storage_team
TeamingMode            : SwitchIndependent
LoadBalancingAlgorithm : Dynamic
Status                 : Degraded

Name                   : VM_Network_Logic switch
Members                : {Embedded LOM 1 Port 3, Embedded LOM 1 Port 2}
TeamNics               : VM_Network_Logic switch
TeamingMode            : SwitchIndependent
LoadBalancingAlgorithm : HyperVPort
Status                 : Degraded

PS C:\Users\administrator.czmedc> Get-VMSwitch

Name                    SwitchType NetAdapterInterfaceDescription
----                    ---------- ------------------------------
Management_Network      External   HP Ethernet 1Gb 4-port 331i Adapter
VM_Network_Logic switch External   Microsoft Network Adapter Multiplexor Driver #2

PS C:\Users\administrator.czmedc> Get-SmbMultichannelConnection

Server Name    Selected       Client IP      Server IP      Client         Server         Client RSS     Client RDMA
                                                            Interface      Interface      Capable        Capable
                                                            Index          Index
-----------    --------       ---------      ---------      -------------- -------------- -------------- --------------
czme-storag... True           10.0.1.9       10.0.1.4       28             24             True           False


PS C:\Users\administrator.czmedc> Get-SmbMultiChannelConstraint

No result

PS C:\Users\administrator.czmedc> Get-VM

Name                             State           CPUUsage(%) MemoryAssigned(M) Uptime     Status
----                             -----           ----------- ----------------- ------     ------
Backup-Server                    Off             0           0                 00:00:00   Operating normally
FilmFactory-WebServer            Running         0           4096              1.20:57:18 Operating normally
GDH-CloudServer(AD)-1            Running         0           2048              1.20:20:22 Operating normally
GDH-CloudServer(Backup)-3        Running         0           5628              1.15:22:31 Operating normally
GDH-CloudServer(WSUS)-2          Running         0           2048              1.20:06:54 Operating normally
GDH-POC Abhishek                 Running         0           8192              1.21:24:24 Operating normally
GDH-POC-Abhishek-Vm2             RunningCritical 0           8192              2.18:41:55 Cannot connect to virtual machine configuration storage
GDH-TM5                          RunningCritical 0           4096              2.18:47:24 Cannot connect to virtual machine configuration storage
Gulf-Packagings-VM1              RunningCritical 0           3968              2.18:48:06 Cannot connect to virtual machine configuration storage
Priamary-DNS                     RunningCritical 0           2048              2.18:40:50 Cannot connect to virtual machine configuration storage
Secondary-DNS                    Running         0           2048              1.20:37:15 Operating normally
Srikanth-WHMC-Automation-Centos7 RunningCritical 0           2046              2.18:49:00 Cannot connect to virtual machine configuration storage
TecomGroup-VM1                   RunningCritical 6           4096              2.09:02:58 Cannot connect to virtual machine configuration storage
Test VM                          Off             0           0                 00:00:00   Operating normally

Open in new window


Thanks

Best Regards
0
 

Author Comment

by:Sri M
ID: 42158142
Hello Philip

Below is from another node

PS C:\Users\administrator.czme> Get-NetAdapter

Name                      InterfaceDescription                    ifIndex Status       MacAddress             LinkSpeed
----                      --------------------                    ------- ------       ----------             ---------
Slot 02 Port 2            Intel(R) Ethernet Converged Networ...#4      23 Up           A0-36-9F-3E-E9-A8        10 Gbps
Slot 02 Port 1            Intel(R) Ethernet Converged Networ...#3      22 Disconnected A0-36-9F-3E-E9-A6          0 bps
vEthernet (Management_... Hyper-V Virtual Ethernet Adapter #2          41 Up           A0-36-9F-A2-23-00        10 Gbps
Slot 03 Port 2            Intel(R) Ethernet Converged Networ...#2      21 Disconnected A0-36-9F-3E-E9-D8          0 bps
Slot 03 Port 1            Intel(R) Ethernet Converged Network ...      20 Up           A0-36-9F-3E-E9-D6        10 Gbps
Slot 07 Port 4            Intel(R) Ethernet Server Adapter I...#4      17 Disconnected A0-36-9F-A2-23-03          0 bps
Slot 07 Port 3            Intel(R) Ethernet Server Adapter I...#2      14 Disconnected A0-36-9F-A2-23-02          0 bps
Slot 07 Port 2            Intel(R) Ethernet Server Adapter I...#3      15 Up           A0-36-9F-A2-23-01         1 Gbps
Slot 07 Port 1            Intel(R) Ethernet Server Adapter I35...      12 Up           A0-36-9F-A2-23-00         1 Gbps
storage_team              Microsoft Network Adapter Multiple...#3      33 Up           A0-36-9F-3E-E9-D6        20 Gbps
Onboard CNA Port 4        Emulex OneConnect OCe14000  10Gb E...#4      19 Disabled     90-1B-0E-AC-AA-33         1 Gbps
Onboard CNA Port 3        Emulex OneConnect OCe14000  10Gb E...#2      16 Up           90-1B-0E-AC-AA-32         1 Gbps
Onboard CNA Port 2        Emulex OneConnect OCe14000  10Gb Eth...      13 Up           90-1B-0E-AC-AA-31         1 Gbps
Onboard CNA Port 1        Emulex OneConnect OCe14000  10Gb E...#3      18 Up           90-1B-0E-AC-AA-30         1 Gbps
VM_Network_Logic switch   Microsoft Network Adapter Multiple...#2      35 Up           90-1B-0E-AC-AA-31         2 Gbps


PS C:\Users\administrator.czme> Get-NetLBFOTeam


Name                   : VM_Network_Logic switch
Members                : {Slot 07 Port 2, Onboard CNA Port 2}
TeamNics               : VM_Network_Logic switch
TeamingMode            : SwitchIndependent
LoadBalancingAlgorithm : HyperVPort
Status                 : Up

Name                   : storage_team
Members                : {Slot 03 Port 1, Slot 02 Port 1, Slot 02 Port 2, Slot 03 Port 2}
TeamNics               : storage_team
TeamingMode            : SwitchIndependent
LoadBalancingAlgorithm : Dynamic
Status                 : Degraded


PS C:\Users\administrator.czme> Get-VMSwitch

Name                    SwitchType NetAdapterInterfaceDescription
----                    ---------- ------------------------------
VM_Network_Logic switch External   Microsoft Network Adapter Multiplexor Driver #2
Management_Network      External   Intel(R) Ethernet Server Adapter I350-T4

PS C:\Users\administrator.czme> Get-SMBMultiChannelConnection

Server Name         Selected            Client IP           Server IP           Client Interface    Server Interface    Client RSS Capable  Client RDMA Capable
                                                                                Index               Index
-----------         --------            ---------           ---------           ------------------- ------------------- ------------------  -------------------
czme-storage1.CL... True                10.0.1.6            10.0.1.4            33                  24                  True                False


PS C:\Users\administrator.czme> Get-SMBMultiChannelConstraint

PS C:\Users\administrator.czme> Get-VM

Name                           State   CPUUsage(%) MemoryAssigned(M) Uptime     Status
----                           -----   ----------- ----------------- ------     ------
BB-AppServer1                  Running 7           2524              1.22:47:52 Operating normally
BB-AppServer3                  Running 1           4096              1.22:48:21 Operating normally
BB-AppServer4                  Running 0           2048              1.22:47:27 Operating normally
BB-DBServer1                   Running 4           6144              1.22:46:54 Operating normally
Czme-DomainController      Running 0           2048              2.01:42:05 Operating normally
CZME-MailServer1               Running 0           2796              2.01:27:00 Operating normally
FMICS-Windows2012R2-VM2      Running 0           32768             2.01:28:14 Operating normally
FMICS-Windows2012R2-VM1      Running 0           32768             2.01:27:28 Operating normally
SOLUTIONS_POC_Server1 Running 0           4096              1.15:27:49 Operating normally
SOLUTIONS_POC_Server2 Running 0           2048              1.15:28:22 Operating normally
Cloud Server-4 (OSSEC)     Running 0           2048              1.21:27:46 Operating normally
TM6                        Running 0           2048              2.01:34:54 Operating normally
Win2012R2_VM2              Running 0           4096              2.01:26:30 Operating normally
PL_Firewall-New         Running 0           6064              1.15:31:48 Operating normally
PL-New-Firewall         Running 1           8192              1.17:46:09 Operating normally
PL-WinVM1               Running 1           10240             2.01:02:28 Operating normally
PL-WinVM2               Running 0           10240             2.00:58:55 Operating normally
PL-WinVm3 DBServer      Running 1           65536             2.00:52:43 Operating normally
PL-WinVM4               Running 3           10240             1.16:37:24 Operating normally
PL-WinVM5               Running 0           10240             2.00:45:57 Operating normally
PL-WinVM-6 2008         Running 0           10240             1.16:30:10 Operating normally
PL-Test VM                     Running 0           4096              1.21:38:39 Operating normally
Tmk-VM1                      Running 0           2048              2.01:28:48 Operating normally

Open in new window

0
 
LVL 39

Expert Comment

by:Philip Elder
ID: 42158955
Are the above results from the Hyper -V nodes? Or is the bottom one from storage?
0
 

Author Comment

by:Sri M
ID: 42158985
Both are from HyperV Nodes? Should I provide you the output from Storage as well?

Regards
0
 
LVL 39

Expert Comment

by:Philip Elder
ID: 42159628
I'm a bit confused. The outputs from the Hyper-V nodes are very dissimilar? Why is that?

Normally, we configure our Hyper-V hosts identically for same generation and with identical series of components for the next generation configuration.
0
 

Author Comment

by:Sri M
ID: 42159665
Hi Philip,

Thank you for your response

Allow me to explain

All hyperv nodes are of same configuration except the make is different. Each node has a two dual port 10 Gbe card, two quad port 1 Gbe cards. All the first ports in 10 Gbe card are teamed for hardware failure redundancy. In quad port first port is used for Management network (Not teamed ) and second port in both quad port 1 Gbe cards are used for vlan traffic (teamed again for redundancy)

This is one of our old cluster running for almost two years now without a problem but lately things started getting unstable. We have a new 2016 cluster formed and they are as per recommendations from Microsoft Hosts are identically of same generation along with components

Let me know if you need further information

Regards
0
 
LVL 39

Accepted Solution

by:
Philip Elder earned 2000 total points
ID: 42160162
Please have a look at my EE article: Some Hyper-V Hardware and Software Best Practices.

The host should only have one IP address via port 0 on the two Gigabit NICs.

One node has Emulex while the other does not?

What changes were made at the time things became unstable? Is there a change control list? Back step through that change log to find what caused the instability.
0
 

Author Comment

by:Sri M
ID: 42161469
Hello Philip

Thank you for your valuable inputs. I did understand, we will have review on this setup and fix the differences. I will revert back to you if that doesn't fix the unstable issue.

Regarding reverting back the change its a bit hard to say we did many changes like updates, upgrade of node hardware at one go. Reverting all of them would be risky for us now however we are planning change the configuration of all nodes to similar ones to see how it goes from here.

Since this might take long time I will close this question.

I really appreciate your time in this regard.

Have a good day

Best Regards.
0
 

Author Closing Comment

by:Sri M
ID: 42161470
Thank you and the information shared is really helpful
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Ransomware is a malware that is again in the list of security  concerns. Not only for companies, but also for Government security and  even at personal use. IT departments should be aware and have the right  knowledge to how to fight it.
Previously, on our Nano Server Deployment series, we've created a new nano server image and deployed it on a physical server in part 2. Now we will go through configuration.
This tutorial will walk an individual through the process of installing the necessary services and then configuring a Windows Server 2012 system as an iSCSI target. To install the necessary roles, go to Server Manager, and select Add Roles and Featu…
This course is ideal for IT System Administrators working with VMware vSphere and its associated products in their company infrastructure. This course teaches you how to install and maintain this virtualization technology to store data, prevent vuln…
Suggested Courses

647 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question