High CPU on VM during overnight process

Hi,

Unfortunatly I cant give too much detail on overnight process. It runs a program that updates an Oracle database/tables various. What is occuring though is during that process the CPU is at least 80% and above according to native VMware monitoring. And quite often 100% according to some third party monitoring.(sampling rate difference?) Now the Applicaion (clarity) which I know little about fails in its process when set at higher than 10 processes (error 500- generic error ) - nothing in any windows logs or IIS logs to verify this.

Now the VM (windows 2008 r2) is configured as such for CPU - 4 sockets 1 core. (ive been advised this is a bad configuration) as contention will occur - the host has several of these servers configured like this I have isolated the VMs as much as I can. Also the host has 8 CPU. - So we do have some contention issues. - Question is what do I do about this problem. Is the error likely due to very high CPU  on the VM ?  I know its hard to detemine from info Ive provided. CAn and should i change the VM CPU specs? - can I do this without damaging the OS?  

Im not getting any help from the Aplication team - so sorry for vaugeness of this question. - They are  blaming the failures on an upgrade of the Netapp Data on tap (it only started failing after this upgrade)- However Storage admin says they are clutching at straws and sees no load on the Filer. As I see the CPU shoot up (on the VM) when they run the overnight process as high as 100% - im concerned that this is why its faiing. when set above 10 processes. - We have rebooted the host and the VM's to no avail. - Im a bit lost now how to approach this - Hancock out there :) - been a great help in the past

Im guessing performace monitor might be an answer :)

any help appreciated thanks
LVL 1
philb19Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
You can certainly try adding more vCPU, this will not break the VM, you are already running a multi processor HAL, so adding another vCPU, it will detect after a reboot, unless you have Hot Plug enabled in the VM, and the OS supports hot plug CPU.

Looking at the VM performance, will show where the performance bottle neck is, are you running NFS/iSCSI any latency in the performance charts on the filer ?

Jumbo Frames in use ?
0
philb19Author Commented:
Hi Andrew - Thanks for response. Im wondering that too much vCPU is a problem 4 socket 1 core. So "removing" vCPU was more the question - any reason this woul 1. Assist in performance. 2. Safe to do without breaking OS

Hot plug is not enabled - so yes shutdown

running nfs - no jumbo frames - storage admin says no latency

thanks
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Removing vCPU the same, plug and play, so no issues.

see here about too many CPUs, in a VM.....and I would recommend the use of Jumbo Frames with NFS and NetApp (MTU 9000) providing all your network switching can support it.

What version of OnTap are you running, 7 mode ? and what filer ?

vSMP (virtual SMP) can affect virtual machine performance, when adding too many vCPUs to virtual machines that cannot use the vCPUs effectly, e.g. Servers than can use vSMP correctly :- SQL Server, Exchange Server.

This is true, many VMware Administrators, think adding lots of processors, will increase performance - wrong! (and because they can, they just go silly!). Sometimes there is confusion between cores and processors. But what we are adding is additional processors in the virtual machine.

So 4 vCPU, to the VM is a 4 Way SMP (Quad Processor Server), if you have Enterprise Plus license you can add 8, (and only if you have the correct OS License will the OS recognise them all).

If applications, can take advantage e.g. Exchange, SQL, adding additional processors, can/may increase performance.

So usual rule of thumb is try 1 vCPU, then try 2 vCPU, knock back to 1 vCPU if performance is affected. and only use vSMP if the VM can take advantage.

Example, VM with 4 vCPUs allocated!

My simple laymans explaination of the "scheduler!"

As you have assigned 4 vCPUs, to this VM, the VMware scheulder, has to wait until 4 cores are free and available, to do this, it has to pause the first cores, until the 4th is available, during this timeframe, the paused cores are not available for processes, this is my simplistic view, but bottom line is adding more vCPUs to a VM, may not give you the performance benefits you think, unless the VM, it's applications are optimised for additional vCPUs.

See here
http://www.vmware.com/resources/techresources/10131

see here
http://www.gabesvirtualworld.com/how-too-many-vcpus-can-negatively-affect-your-performance/

http://www.zdnet.com/virtual-cpus-the-overprovisioning-penalty-of-vcpu-to-pcpu-ratios-4010025185/

also there is a document here about the CPU scheduler

www.vmware.com/files/pdf/perf-vsphere-cpu_scheduler.pdf

https://blogs.vmware.com/vsphere/2013/10/does-corespersocket-affect-performance.html
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

philb19Author Commented:
NetApp Release 8.1.4P8 7-Mode

fas 3240

Is there a problem - with this version perhaps?

using SMP correctly - how do i know if this is the case?

thanks again
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Try reducing the CPUs, and see if that's any better.

check performance for bottleneck.

What changes have caused this issue ?

Nothing different is being done on the VM now ?

How long has this been virtual ?
0
philb19Author Commented:
changes - Well the data on tap upgrade co-incided.

this turned into a disaster in that the way dedupe metadata was handled changed  The upgrade was to fix an issue with dedupe metadata not being cleaned up- we couldnt run sis -s to delete
the old metadata after the upgrade - as at the same time someone pulled a PDU - lost all shelf power with all VMDK - this corrupted exchange - so we didnt run sis -s at the sametime/weekend. In hindsight we should have - (but it turned into a perfect storm) - so in effect it (the new version)recreated new dedupe metadata and left the old metadata in place and filled 100% the volume where this VM was. - This of course brought everything down including DC's - I overcame by making space and then refreshing all VMs - and of course time was out so i had to set the correct time on all VMs - a day later we ran sis -s to delete old metadata

now this all occured 2 weeks ago and everything except this clarity overnight process is fine - it to works but only when they( Clarity admin) sets tha amount of tasks to 10. its normally set at 30. Before this nightmare outage it was running consistently fine set at 30 - for years - Hence the finger pointing at storage/vmware

the clarity system Project management system - has been on Vmware always for 5 years

any clues :)
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Okay, for test purposes, do you have the facility to move the VM to Local Storage... to check performance, and rule out the NetApp I/O disk issue.
0
philb19Author Commented:
Im not sure there is a disk I/O issue though - The storage admin says no IO problem. - i can storage motion to another volume - local storage ? no - local disk on esx host you mean?
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Yes, local storage - this was so you could rule out the NetApp change or cock-up!
0
philb19Author Commented:
The Esx hosts dont appear to have local storage available (least nothing appears in vCentre client)- I was thinking of Storage motion to the test storage.
In fact they have a test environment (with test clarity running same processes) - that they say still works at 30.
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Okay, so in the test environment, what is different ?

Check if you have any local storage, it would be good to exclude the NetApp as the issue.
0
philb19Author Commented:
As suspected turned out to be a database connection problem with more connections required than available on oracle database - award point to hancock for help in CPU info
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
VMware

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.