Guest OS not responding

Hi guys.

I am having an ongoing issues which I am lost for ideas on.

Hardware: HP ProLiant ML350p Gen8. Duel Xeon CPU's, 32 GB RAM Raid 6 over 6 drives.

Software: It is running ESXi 5.5 one guest OS Server 2012 R2 and Acronis VM Protect as an appliance.

Issue. I have had it lock up on me a few times now without any real clues. The lockup is the guest OS (server 2012) . I can't ping it I can't Access it via Console or RDP the only option is to push a restart/Shutdown of the guest OS. I can't see any issue on the VM logs using vSphere.

As for the Guest OS no bluescreens and no clues that I can see in the event viewer. I don't know what to do here. This server is running critical client data and I really can;t afford for this issue to be left unchecked.

It has to be something in Windows I am thinking. But honestly I don't know. Any suggestions at all would be awesome.
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Zephyr ICTCloud ArchitectCommented:
Do the lock-ups happen randomly or at fixed hours, perhaps during backups? What kind of disks are you using?, I see you mention 6 drives in raid 6, but what kind of drives, 7.2k/10k/15k rpm?

What kind of network adapter for the VM? vmxnet3/E1000e?

This sounds either storage or network related ... But we'll need some more info.
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
A couple of checks, are you using the OEM HP version of ESXi 5,5 ?

Have you updated all the firmware on the host server, using the latest firmware from HP ?

Do you have a Battery Backup Write Cache module (BBWC) ?

Is the storage controller a Smart Array ?
AlwayslearningmoreAuthor Commented:
The drives are HP 600GB 6G SAS 10K 2.5in SC

NIC is VMXNET3 - I was using the E1000 however I had a purple screen of death a few times,

I am using ESXi 5.5 direct from VM ware not HP OEM.

I have not checked firmware updates as yet.

Yes this Raid card does have a Battery Backup
10 Tips to Protect Your Business from Ransomware

Did you know that ransomware is the most widespread, destructive malware in the world today? It accounts for 39% of all security breaches, with ransomware gangsters projected to make $11.5B in profits from online extortion by 2019.

Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
I would always recommend firmware is up to date, with HP Smart Start DVD ROM, and you have installed the OEM HP version.

Make sure the Smart Array BBWC, is configured as 75% Write and 25% Read.

Also make sure VMware Tools is installed.

Is it just a single VM, which hangs ?

How do you know it's hung ? What is the server role ?

What resources have you allocated the VM ?
AlwayslearningmoreAuthor Commented:
Yep it is just the single VM which hangs, I did have an RDP server however this is powered off at the moment. It would not lock up when my Locking up VM did.

VMTools is installed.

When it locks up. I get an alert from Control now to say the server is offline.
It won't ping internally, or externally
I can't RDP to it.
The databases which are running on the server no longer work
And the console won't display anything just black.  
The green light on Vsphere is the only clue the machine is still powered on.
The login server can not be contacted.

I have about 90% of the system resources allocated to this VM.

The server is AD, SQL Server, file server for 10 users.
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
90%, can you be specific ?

if you have allocated ALL the host RAM, this could be the issue....

so, how much RAM and CPU have been allocated to the VM ?

numbers please....

It's also not really recommended, to have all those roles on your server, is this SBS server or just Windows 2012 ?

and you should have at least two Domain Controllers.
AlwayslearningmoreAuthor Commented:
Sure we can do numbers.

The guest OS is Server 2012 R2

24GB of RAM allocated out of 32GB
24 Cores.
650 GB of disk space, 150GB for OS and 500 GB for data
One NIC in use in windows.
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
I think you really need to have a look at what requirements you need for your VM.

24GB and 24 Cores, seems a lot.

When an AD server for 10 users, 4GB and 1 vCPU, should be fine.

File and print, 4GB and 1 or 2vCPU should be fine.

I think I would look at whether you really need all that resource in a single VM!

I would also recommend splitting the VM into 3 servers, AD, File and Print, and SQL Server.
AlwayslearningmoreAuthor Commented:
Ok great,

So I don't have an issue with lack of resources. thats a good thing. Licensing issues are not going to let me split into three. I can split into two no worries.

Overkill of resources is not going to kill my guest OS though and make it lock up. So what is causing the lock ups.  

How can I identify my issue. Am I missing something simple ? The suggestions above seem to be pointing to hardware. Is there anything in the guest OS I should be looking for ? Or do you think that is the wrong place to be looking.
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Reduce the resources.

Why have you set the resources e.g. CPU and Memory to this in the VM....

is the thinking that, if this was a server on this physical box, it would have all the memory and CPU, so I'll do the same thing under the hypervisor ?

and if that is what you have set for this single VM, what is left for the Acronis Appliance ?

What is the overall percentage of resources available, in the host summary ?

You have left 2GB at least for the host ? If the hosts runs out of memory, it will start to swap to disk, and everything will become non-responsive, and appear to lock-up.
If your VM is over subscribbed it can cause issues.

vSMP (virtual SMP) can affect virtual machine performance, when adding too many vCPUs to virtual machines that cannot use the vCPUs effectly, e.g. Servers than can use vSMP correctly :- SQL Server, Exchange Server.

This is true, many VMware Administrators, think adding lots of processors, will increase performance - wrong! (and because they can, they just go silly!). Sometimes there is confusion between cores and processors. But what we are adding is additional processors in the virtual machine.

So 4 vCPU, to the VM is a 4 Way SMP (Quad Processor Server), if you have Enterprise Plus license you can add 8, (and only if you have the correct OS License will the OS recognise them all).

If applications, can take advantage e.g. Exchange, SQL, adding additional processors, can/may increase performance.

So usual rule of thumb is try 1 vCPU, then try 2 vCPU, knock back to 1 vCPU if performance is affected. and only use vSMP if the VM can take advantage.

Example, VM with 4 vCPUs allocated!

My simple laymans explaination of the "scheduler!"

As you have assigned 4 vCPUs, to this VM, the VMware scheulder, has to wait until 4 cores are free and available, to do this, it has to pause the first cores, until the 4th is available, during this timeframe, the paused cores are not available for processes, this is my simplistic view, but bottom line is adding more vCPUs to a VM, may not give you the performance benefits you think, unless the VM, it's applications are optimised for additional vCPUs.

See here

see here

also there is a document here about the CPU scheduler

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
AlwayslearningmoreAuthor Commented:
Awesome thanks. Lots to read, I will come back with questions after lunch.

I appreciate your time and help.
AlwayslearningmoreAuthor Commented:

Is the thinking that, if this was a server on this physical box, it would have all the memory and CPU, so I'll do the same thing under the hypervisor ?

Yes this is my general thinking.

However I have left some resources.

Acronis and the host have 8GB of RAM between them so I believe they are fine in that area.

 I will drop the CPU's on my VM. It appears I have them setup correctly. But by how much.
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Can you screenshot and show me the summary information for the Host.

Is the thinking that, if this was a server on this physical box, it would have all the memory and CPU, so I'll do the same thing under the hypervisor ?

Yes this is my general thinking.

Well, that's not how we normally provision VMs.

We add the amount of CPU and Memory that is required, and then monitor and check performance, and add more if required.

e.g. start with 1 vCPU  and 4GB RAM.

Virtualisation is about the consolidation of servers, and sharing the resources via multiple servers.

see my EE Article

HOW TO:  Performance Monitor vSphere 4.x or 5.0
AlwayslearningmoreAuthor Commented:
Reducing the resources resolved the issue. Thanks for all your help and guides I appreciate your time.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.