Solved

Low Performance of ESXi 4.1 Server on Dell M710

Posted on 2011-03-11
46
1,487 Views
Last Modified: 2012-06-27
Hi,
We have a Dell PowerEdge M710 blade server, with 144GB of RAM, with four SATA disk drives each 600GB. I could not remember the Raid level used, but the total available disk for usage is 1.63TB. We have installed ESXi 4.1 free license. After installing 10 Virtual Machines(VM) with 4GB of RAM each, and using only 6 VMs, the ESXi has become very slow. All the VMs have become extremely slow that even small software are taking huge time to install. I have heard that this may be because of Input/Output Processing and may also related to Raid. Can anyone help what might be the issue.

thanks
0
Comment
Question by:mkalugotla
  • 16
  • 13
  • 10
  • +4
46 Comments
 
LVL 30

Expert Comment

by:IanTh
ID: 35110861
how many cores have you got

are  the sata drives in a raid controller if it is what hardware raid controller are you using
you are running raid 5 as 4 * 600 is 2400 gb and in raid 5 you lose roughly 1/3 so thats why you get 1.63 tb
0
 
LVL 30

Expert Comment

by:IanTh
ID: 35110873
also I have noticed if a vm is windows it takes more esx processing slices than linux
0
 
LVL 6

Expert Comment

by:zane_o
ID: 35110898
Are you sure those aren't SAS drives? I can't see spending a small fortune on RAM and going cheap on the drives. If they are SATA, I would get those changed out.

What do the performance counters on the VMs look like. Is there a lot of CPU activity? Have you assigned more vCPUs than necessary to the VMs?

Could be the storage driver too.
0
 
LVL 8

Expert Comment

by:ragnarok89
ID: 35110916
Possible causes for poor performance:

- Not enough CPUs in the Blade server
- T1hin Disk VMs that are trying to grow, but not enough Disk space in ESX
- Not enough RAID 5 Arrays (if you have 5 VMS on a single RAID aray, it might be too much,
   never mind 10 VMs)
0
 
LVL 9

Expert Comment

by:BDoellefeld
ID: 35110925
You didn't mention what roles these VM servers function as.  Asnything with high I/O? This sounds like disk I/O and latency to me.

Take a look at it in ESXi by going to the performance tab and and changing the chart options to Disk. In the counters window enable the I/O latency monitors.

You want it to be below 20, although it is normal for it to bounce above 20 sometimes. If it is sustained above 20 that's more than likely your problem.
0
 
LVL 9

Expert Comment

by:predragpetrovic
ID: 35110929
VM performance issue in your case is caused by SATA drives and RAID level (but most likely by SATA drives). SATA is not recommended for that amount of virtual machines on that RAID level with that number of disks. Also it depends on the type of services running (for example Exchange and SQL together with transaction logs + Active Directory, Sharepoint maybe). This will degrade your performance to the power of 10th.

I would recommend to either move to RAID1+0 (this will cause data loss,reformatting your RAID, rebuilding machines from backup etc...) or buy a storage with SAS or FC disks and migrate servers.

Sorry for the bad news.
0
 

Author Comment

by:mkalugotla
ID: 35111044
Its not SATA, What I have is : 4 X 600GB 2.5-inch 10K RPM, 6Gbps SAS Hot Plug Hard Drive

Configured with RAID 5, So logically i have 1.63TB of drive space.

Also, Iam creating Windows VMs.

0
 

Author Comment

by:mkalugotla
ID: 35111066
CPU Cores : 12 CPUs X 2.659 GHz
Physical Processors : 2
Logical Processors : 24
0
 
LVL 30

Expert Comment

by:IanTh
ID: 35111068
you said sata not sas there is an i/o difference

I expect your cpu cores are maxed out you can see in the performce tab in viclient when you on the esx server
0
 
LVL 30

Expert Comment

by:IanTh
ID: 35111073
so how many vcores have you added to the vm's
0
 

Author Comment

by:mkalugotla
ID: 35111087
// so how many vcores have you added to the vm's
How can I check this? Pls let me know
0
 
LVL 6

Expert Comment

by:zane_o
ID: 35111101
In summary:
How many CPUs assigned per VM?
Did you thin provision the disks?
What do your CPU and Disk Counters reveal?
What are the functions of the Windows servers?
0
 
LVL 30

Expert Comment

by:IanTh
ID: 35111112
go into the settings of the vm
0
 

Author Comment

by:mkalugotla
ID: 35111151
Summary of each VM shows : 1vCPU
I have chosed 'Typical VM Creation', not Custom.

Yes, I have chosed Thin provision for disks.

Performance tab was showing data, but after switching on around 6 VMs and installing products like Sharepoint, active directory, IIS, Exchange, the performance tab shows 'performance data is currently not available for this entity'

Windows Server fuctions are installing Sharepoint, Exchange, Anti Virus product etc.
0
 
LVL 30

Expert Comment

by:IanTh
ID: 35111248
so the esx serever is being maxed out which is more than likely all the installs as thats a one off really when you do an install
0
 

Author Comment

by:mkalugotla
ID: 35111272
can you pls tell me any solution?
do i need to increase virtual processor number while installing VM?
0
 
LVL 30

Expert Comment

by:IanTh
ID: 35111292
well can't you just let the installations finish first  
0
 

Author Comment

by:mkalugotla
ID: 35111305
They are finished
0
 
LVL 30

Expert Comment

by:IanTh
ID: 35111316
what and the performance tab is still not responding?
0
 

Author Comment

by:mkalugotla
ID: 35111423
i have attached the performance data. performance
0
 
LVL 28

Accepted Solution

by:
bgoering earned 250 total points
ID: 35111433
With you setup I would expect that any performance issue would be disk I/O related. With the workload you have described it is unlikely you are maxing out utilization on either cpu or memory.

RAID 5 will not perform quite as well as RAID 10, but from what you have described it doesn't really sound like a raid level issue either. A common problem for realllly realllly atrocious performance is when one builds a server without battery backed write cache (BBWC) on the RAID controller -- or the controller is incorrectly configured. I would guess you probably have a Perc6i controller. Check to be sure that (1) the Battery module is installed, and (2) that the controller is configured for "write back" mode (as opposed to write through mode).

Let me know what you find on the controller.

Good Luck
0
 

Author Comment

by:mkalugotla
ID: 35111487
How can I check these from an ssh console.. pls let me know
0
 
LVL 28

Expert Comment

by:bgoering
ID: 35111493
You can easily tell if you have the battery module by looking at the health tab (see below). If it is present there is no easy way to determine if the configuration is for write back. You would have to go into the controller setup (F8 maybe? There will be a message on what key to press) at boot time to see and/or change the cache configuration.

 Hardware Health
0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 
LVL 30

Expert Comment

by:IanTh
ID: 35111516
performance tab can show datastore , cpu, ram

Also is your raid battery is still charging as that can affect the raid controllers performance
0
 

Author Comment

by:mkalugotla
ID: 35111561
battery battery
0
 

Author Comment

by:mkalugotla
ID: 35111636
Seems BBWC is not installed.
0
 
LVL 6

Expert Comment

by:zane_o
ID: 35111650
I agree IO is likely the cause although I doubt it is battery related. My wild guess is that when you installed your AV software, it started a full scan and you had 6 full virus scans all running at the same time.
0
 
LVL 28

Expert Comment

by:bgoering
ID: 35111675
Hmmm - I haven't actually used the CERC controller - but from your screenshot it doesn't appear to have a battery. You might want to review http://support.dell.com/support/edocs/software/smarrman/marb35/en/controll.htm in the section on caching.

"Write policy for PERC 2, 2/Si, 3/Si, 3/Di, and CERC SATA1.5/6ch controllers
Write Cache Enabled. When the write cache is enabled, the controller writes data to the write cache before writing data to the array disk. Because it takes less time to write data to the write cache than it does to a disk, enabling the write cache can improve system performance. Once data is written to the write cache, the system is free to continue with other operations. The controller, in the meantime, completes the write operation by writing the data from the write cache to the array disk. The Write Cache Enabled option is only available if the controller has a functional battery. The presence of a functional battery ensures that data can be written from the write cache to the array disk even in the case of a power outage.

Write Cache Disabled. This is the only available option if the controller does not have a functional battery. "

0
 
LVL 30

Expert Comment

by:IanTh
ID: 35111758
cerc is a built-in perc isn't it
0
 
LVL 28

Expert Comment

by:bgoering
ID: 35111825
Perc6i is the intergrated built in perc.

You might want to try a disk throughput tool like disk bench(http://www.nodesoft.com/diskbench/)  in order to actually measure your disk performance.

For my R710 ( a rack version of your M710 ) that I posted the health screen from I get results like the following on a SATA raid 5 array - with SAS disks yours should be a bit better.

10 MB Create File Bench
Starting Create File Bench...

Created file: dummy1
  Size: 251658240 bytes
  Time: 1438 ms
  Transfer Rate: 166.898 MB/s

Create File Bench ended

Create File Batch (with all default settings)
Starting Batch Create File Bench...

48 MB; C:\Users\Administrator.BGDOMAIN\AppData\Local\Temp\1\Test; 50331648 bytes; 281 ms; 170.819 MB/s
52 MB; C:\Users\Administrator.BGDOMAIN\AppData\Local\Temp\1\Test; 54525952 bytes; 297 ms; 175.084 MB/s
56 MB; C:\Users\Administrator.BGDOMAIN\AppData\Local\Temp\1\Test; 58720256 bytes; 297 ms; 188.552 MB/s
60 MB; C:\Users\Administrator.BGDOMAIN\AppData\Local\Temp\1\Test; 62914560 bytes; 313 ms; 191.693 MB/s
64 MB; C:\Users\Administrator.BGDOMAIN\AppData\Local\Temp\1\Test; 67108864 bytes; 219 ms; 292.237 MB/s
68 MB; C:\Users\Administrator.BGDOMAIN\AppData\Local\Temp\1\Test; 71303168 bytes; 359 ms; 189.415 MB/s
72 MB; C:\Users\Administrator.BGDOMAIN\AppData\Local\Temp\1\Test; 75497472 bytes; 359 ms; 200.557 MB/s
76 MB; C:\Users\Administrator.BGDOMAIN\AppData\Local\Temp\1\Test; 79691776 bytes; 266 ms; 285.714 MB/s
80 MB; C:\Users\Administrator.BGDOMAIN\AppData\Local\Temp\1\Test; 83886080 bytes; 391 ms; 204.604 MB/s
84 MB; C:\Users\Administrator.BGDOMAIN\AppData\Local\Temp\1\Test; 88080384 bytes; 391 ms; 214.834 MB/s

Create Batch File Bench ended

Let me know your results
0
 
LVL 28

Expert Comment

by:bgoering
ID: 35111858
Possibly bad news... this page (http://stuff.mit.edu/afs/athena/dept/cron/documentation/dell-server-admin/en/Perc6i_6e/chapterb.htm) indicates the BBWC isn't available on the CERC controller. That function is pretty much a must for decent vm performance...
0
 

Author Comment

by:mkalugotla
ID: 35116141
does this mean CERC controller will have bad performance with ESXi?
0
 
LVL 6

Expert Comment

by:zane_o
ID: 35116235
@mkalugotla
No, your performance should be acceptable, but it could be better with write caching enabled. Your performance should be ok right now correct? Now that things from the installation, patching and AV scans have completed?
You will probably need to avoid things like scheduled AV scans all kicking off and running at the same time, instead stagger the schedules and make sure they run in off hours. Also, on the new servers you add, if you have the disk space try fixed size volumes. If you have many different VMs all trying to expand their volumes at the same time, it will impact performance. Also, stagger things like automatic update reboot times.
In addition, 10 servers, especially things like Exchange or other potentially high I/O systems, running on just 4 drives can be a lot to ask. You would get better performance by spreading the load over more disks, preferably using a SAN.
I wouldn't panic though, you are just learning the ins and outs of running in a virtual environment. I would recommend increasing the vCPUs assigned to the VMs to 2 as well (at a minimum depending on usage).
One thing I have done in the past is to "pre-expand" thin volumes by copying large files to the volume, then deleting them.  You may copy about 10-15GB worth of data to each server and then delete it to free up that space. That seemed to work for me.
Your environment just needs a little "fine-tuning" to get the most out of it.
0
 
LVL 30

Expert Comment

by:IanTh
ID: 35116269
yes things liuke exchange is not really required to learn esx vsphere just a dc and a vcenter server

You will need a iscsi solution to setup a esx cluster as you will not be able to lkearn ha, vmotion and drs without a cluster I use o-penfiler for that as the vm esx's connect to iscsi on my openfiler

Also setting it up with virtual esx servers make the cluster easier to setup as the cluster members have to be identical
0
 
LVL 28

Expert Comment

by:bgoering
ID: 35117040
I would probably find performance to be unacceptable without write back caching. However, it all depends on you and your expectations and your needs.

I would take issue with the previous recommendation to increase vcpu to at least two - I never exceed a single vcpu unless the demands of the application dictate allocating more than that. Workloads that will run fine on one vcpu will actually suffer a performance hit if you allocate more than one because of the overhead in manageing and scheduling cpu in a virtual environment, as well as the overhead withing the guest os of managing a smp.

As noted by another expert, to fully exploit virtualization capabilities you will need  a SAN or NFS for shared storage between multiple ESX(i) systems as well as some form of paid licensing (as opposed to the free licenses) to enable the more advanced features. If you are first starting your evaluation of virtualization local storage is fine - but for a fair indication of how it will perform I would make the additional investment of a raid controller with BBWC for your disks. From what I can determine that option is not available on a CERC6 controller. You can get the Perc6 controller with the BBWC module as another card that you plug in - the investment for that would be minimal considering the investment already made in the M710 server.

Good Luck
0
 

Author Comment

by:mkalugotla
ID: 35127127
Hi,

I have enabled 'Force WB with no battery' option in Raid configuration, and verifying. Till now i didnt see any performance issue.
0
 
LVL 30

Expert Comment

by:IanTh
ID: 35127227
I dont think you will a card with a bbwc will be faster like a perc where setting force wb with no battery is for compatible reasons not performance
0
 
LVL 28

Expert Comment

by:bgoering
ID: 35128012
@mkalugotla - I wouldn't leave that setting for any production load, or for data on the drive that you might want to keep. It might be ok for testing but be aware it does put your data on the drives at risk should you have a power failure.

Did you ever get those disk bench numbers? Might be interesting to see the before and after your setting change results.
0
 

Author Comment

by:mkalugotla
ID: 35130063
The system is configured with Raid 5 level, but enabled 'Force WB with no battery'.
Does the data still be under risk ? If a power failure happens, does the entire VMs,vmdk files may get deleted ? which mean we may be in more risk.

0
 
LVL 28

Expert Comment

by:bgoering
ID: 35130214
Pretty much any or all of the data on the raid volumes can be at risk. The difference between write through (standard with no battery) and write back is that:

Write through - the data is actually committed to and written to disk before the OS gets a notification the I/O has been completed.

Write back - the OS is notified that the I/O is completed as soon as the controller has the data in cache, compared to the other this happens a lot faster. If a power loss occurs and the data is in cache but not on disk the results will be unpredictable. The OS may have done many I/O operations that have not been committed to disk yet.

Typically with write back, if data is still in cache that has been written, a read request for that data can occur from cache also without actually going to disk.
0
 
LVL 28

Expert Comment

by:bgoering
ID: 35130229
Also the more cache the better - often when BBWC is installed the amount of cache goes up too. Did this change take care of your performance issue. That was actually the topic of this question.
0
 

Author Comment

by:mkalugotla
ID: 35130384
But I dont have BBWC because the hardware has CERC.
If I understand correctly, if a power failure happens, and if the data is in cache, only this data will be lost. All the other data like snapshots, VMX, will not affect.
If I have an option to revert back to snapshots, even after the power failure, then Iam safe.
0
 

Author Comment

by:mkalugotla
ID: 35130393
And the dell server has a corporate UPS system, for backup.
0
 

Author Closing Comment

by:mkalugotla
ID: 35130680
Thank you very much for the valuable information.
0
 
LVL 28

Expert Comment

by:bgoering
ID: 35130879
I still wouldn't consider it to be "safe" - suppose for example you lost the directory information for the volume -- after that you may not be able to find anything on the datastore! No vmdk or snapshot files. I would still highly recommend you obtain another controller to replace the CERC and that it is properly equipped with BBWC.
0
 
LVL 30

Expert Comment

by:IanTh
ID: 35131112
just set the vm's to auto snapshot if you have enough space I suppose
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Last article we focus in how to VMware: How to create and use VMs TAGs – Part 1 so before follow this article and perform the next tasks, you should read the first article how to create the TAG before using them in Veeam Backup Jobs.
HOW TO: Upload an ISO image to a VMware datastore for use with VMware vSphere Hypervisor 6.5 (ESXi 6.5) using the vSphere Host Client, and checking its MD5 checksum signature is correct.  It's a good idea to compare checksums, because many installat…
Teach the user how to edit .vmx files to add advanced configuration options Open vSphere Web Client: Edit Settings for a VM: Choose VM Options -> Advanced: Add Configuration Parameters:
Teach the user how to use create log bundles for vCenter Server or ESXi hosts Open vSphere Web Client: Generate vCenter Server and ESXi host log bundle:  Open vCenter Server Appliance Web Management interface and generate log bundle: Open vCenter Se…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now