Link to home
Start Free TrialLog in
Avatar of David Johnson, CD
David Johnson, CDFlag for Canada

asked on

Hyper-V Machines in Stopping State

Hyper-V Machines in Stopping State?
It's not because of Hard drive space 472 GB Free of 1.09TB
30 VM's of which 14 are in stopping state..

Server 2016 Datacenter
HP DL380 G6
96 GB Ram
2x e5649  6 core CPU'sUser generated image
Avatar of Tom Cieslik
Tom Cieslik
Flag of United States of America image

Hi David.
You've said you have 30 VM's right ?
Your Host has 2 processors 6 core each,,, so 2x6x2 = 24 threads (Virtual Processors)
Microsoft VM Best Practice rule is to not go beyond virtual processor limit - 2 vp for host.
Actually mathematical  definition is  maximum VP - 2 for Host OS.
In your case: IF YOU DID ASSIGNED ONLY ONE VIRTUAL PROCESSOR (THREAD) PER VM, YOU ARE ALLOWED TO HAVE:  24 -2 =22 VM's
If for some reason few of your VM has more than one VP assigned, you have a bigger issue here.
Maybe this is your problem,,, you have too many VM's on single Host for this Hardware configuration !!!
On the host bring up ResMon.EXE and head to the Disk tab. Does Disk Activity show any VHDX files being worked on?
What do the event logs say?
Avatar of David Johnson, CD

ASKER

1 Vcpu per machine..  Disk and cpu contention would only make things run super slow.


I had to actually restart the host and the vm's restarted and then there was a bunch of altaro temporary snapshots being removed.

There was an Altaro backup running at the time.  one lab has altaro and another has veeam B&R
ASKER CERTIFIED SOLUTION
Avatar of Tom Cieslik
Tom Cieslik
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I will split the vms and use them on the replica servers to see if the vcpu's are the problem
If there were Altaro snapshots happening at the time then that's the source of the "issue". That would have become readily apparent as well with the Disk Activity check.

What is the storage subsystem setup?

Break the Altaro backup schedules up to stagger the VM's backups. Don't do them all at once. A VSS snapshot is extremely I/O intensive thus the stalls.

EDIT: As an FYI:
1: CPU is very rarely the bottleneck. The storage subsystem is almost always the place to start with storage to compute fabric being the next step. Since this is a standalone host, storage is it.
2: A VM should have at least two vCPUs assigned. If there's a runaway process on the VM with only 1 vCPU assigned the VM is toast to access.
3: PerfMon has both host and in-guest counters. Use those to verify where all of the host's systems are at relative to load.
The problem is that there was 0 disk activity and it is set to do 2 simultaneous backups down from the default of 4
The catch for me is that we've run into Altaro and VSS snapshot issues in the past. I'm still leaning towards that being the source of the problem.

vssadmin can be used to delete all snapshots. That's one of the troubleshooting steps that I suggest taking.
vssadmin delete shadows /all

Open in new window

I have been using Veeam B&R and given the hype about altaro being equal but easier to configure I thought I'd give it a shot. Didn't notice the problem until I checked my mail and saw all of the server has been down for more than xx minutes from spiceworks.  Went into Hyper-V Manager and found they were all in stopping mode..  Get-Vm hung up ... Since this is not production stuff.. Did a Shutdown/restart