Tracking down 100% cpu usage on a vSphere host

Ben Conner
Ben Conner used Ask the Experts™
on
Hi,

Am experiencing 100% cpu usage on a vSphere host that typically never exceeds 15%.  Looking at the performance overview, the VMs themselves are consuming very little overhead, under 10% total.

Any suggestions on how to track this down?  6.7.0, with VCSA managing it.

The only thing that changed recently was an attempt to add an ISCSI device.  Thought it had worked but I'm now not seeing it in the list of datastores.

I am doing some Veeam replications right now but those have never been CPU intensive historically.

And now it has dropped down to normal levels  1.21THz out of 15.19 available.  Sure would like to find out what happened.

--Ben
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017
Commented:
100% CPU if you've now enabled iSCSI can be the result of iSCSI Polling!

it spirals out of control, polling and polling, and polling, and CPU goes through the roof....

(you get the idea!)

iSCSI has been mis-configured, a LUN has dropped off, e.g. configure it all so it's working, and then turn off the SAN, and the same result happens!

It will hamper the performance of ALL VMs on the Host!

ESXi keeps searching (polling) for the SAN, and until it stops it drives CPU out of control.... eventually it gives up and stops!
Ben ConnerCTO, SAS developer

Author

Commented:
It did quit about when the Veeam replication job finished.  Could have been a coincidence.

ISCSI vs NFS--is one more efficient than the other?  

I did make the ISCSI device too small.  Had it on GB vs TB when I defined it.  Oops.

Fixing that now.

I did see that even though the CPUs were pegged, I saw no noticeable sluggishness on the VMs.  Impressive design.

Trying to unmount the ISCSI volumn now.  Taking a good while.  Huh.

Looks like it isn't going to unmount.  Been several minutes now.
Ben ConnerCTO, SAS developer

Author

Commented:
Yeah, the host couldn't communicate with the ISCSI Target for the unmount.
"An error occurred while communicating with the remote host".

Any way to remote the datastore w/o the cooperation of the target?
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017

Commented:
ESXi and iSCSI when it goes wrong it really goes wrong....

and this is why we've certified iSCSI SANs, use a non certified storage solution, and your mileage may vary....

iSCSI by nature has to encapsulated all the SCSI commands in a packet and send it down the wire, which is handled by the CPU, but with todays CPUs, this is usually not an overhead.

iSCSI versus NFS in terms of performance depends on many things, jumbo frames and the SAN at the other end,

What you have is an emulated SAN.... NFS and jumbo frames may perform better, but you'd have to test and test different workloads.

Again, we would never use for VMs in Production.
Ben ConnerCTO, SAS developer

Author

Commented:
Am going to close this one out since Andrew identified the root cause.  Was able to see the comm issues with the Veeam ONE Monitor.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial