Solved

Hyper-V Disk Performance Problems

Posted on 2012-03-16
10
174 Views
Last Modified: 2014-10-02
Hi,

We have 3 Hyper-V hosts running 28 guests VMs on a Cluster Shared Volume on an iSCSI SAN using 3gb links.  All vhd files are currently dynamic.  The volume's underlying array has 16 x 10K spindles in a RAID 10.  I suspect the issue is related to fragmentation and installing the PerfectDisk trial application has informed us that our CSV volume is nearly 60% fragmented.  All of our VMs are experiencing performance issues. Each of our hosts have 48GB RAM and 32 processor cores.

Has anyone here had significant performance problems related to fragmentation on a CSV?  If so, did installing a product and scheduling frequent defrag operations solve the problem?

Thanks
0
Comment
Question by:USInspect
  • 5
  • 5
10 Comments
 
LVL 47

Accepted Solution

by:
dlethe earned 500 total points
Comment Utility
Really, this is unlikely to be an issue of fragmentation as much as it is an issue of insufficient IOPs & throughput.  

First defragging is always a good thing, but it isn't magic which will give you unlimited performance.  At some point you simply need to add more HDDs.  

28 guests sharing a 16drive RAID10?  Unless these are SSDs then pretty safe bet you need to add more storage and divide the workload rather then just defragging.
0
 
LVL 47

Expert Comment

by:dlethe
Comment Utility
To do this on-the-cheap, I would get a low-cost RAID1 controller and put a pair of quality SSDs on it, and then split that up as needed. Put as much random I/O as you can on it.   $1000 will buy you 40,000 IOPs in SSD.  Probably a good return on investment.   Obviously total capacity isn't the issue.

Then move index files to the SSDs.   Note, not all rAID controllers play well with SSDs, and I sure wouldn't let your existing controller do both SSDs and mechanical drives, unless it is PCIe2.0 bus.
0
 

Author Comment

by:USInspect
Comment Utility
Thanks. Here is some more info: Our SAN vendor looked at the SAN logs and told us that it had not been using anywhere near the IOPS of which it is capable. The array has 16 x 10K SAS disks. When setting up our environment Dell told us that a 12 or 14 bay SATA array would be sufficient for the IOPS our environment was reporting at that time and our requirements have gone actually gone down a bit since then. So, we're positioned much better than that I think. Lastly, our environment hasn't really changed much over the past few months.  It was performing well for a few months and then the problem started to get progressively worse without us having added any significant IO workload (a small handful of light VMs).
0
 
LVL 47

Expert Comment

by:dlethe
Comment Utility
easy way to find out .. run perfmon on the VMs and look at the queue depth.  
IOPs are mutually exclusive with throughput, and you need to make sure how IOPs are measured.

Most likely something lost in this translation.  VMWARE is effectively RANDOM IOPs, and in your case, probably 64KB in size.  That is what is happening on the disk drive, 64KB random I/Os.

Read the specs on a typical 10K RPM enterprise SATA disk, and they are good for 150 random 4KB IOPs at the high end.

BUT ... since you are doing 64KB IOPs, each disk is going to be good for 64/4 of these, or roughly 10 64KB random IOPs.

That is in a perfect world on reads, with load balancing (each disk handles half of the read requests).

On writes, both disks have to be written to, so that you have 150 write IOPs total the way you have it configured with your workload.

Did your vendor explain THAT to you??  [Note, with reads, you can also do partial reads so if you only need 4 kb then some controllers will ask for less then 64KB, but this is not guaranteed, and in almost all cases, you have to write the full stripe size of 64KB on any writes, to both disks .... hopefully your controller has battery backup for cache, as this will improve write speed]
0
 

Author Comment

by:USInspect
Comment Utility
We aren't using SATA, we're using 10K SAS which is much faster. I was just using Dell's recommendation of SATA based on our perfmon results as an example that our issue shouldn't be related to the 16 x 10K SAS RAID 10 array that we are using because that is much more robust than what Dell recommended for our environment.  When shopping for a storage solution, we were recommended similar less robust configurations by NetApp, Compellent, HP, etc. What I was trying to say was that our SAN vendor looked at the storage array logs and found that it had never been asked to perform at levels that were anywhere near it's max capabilities on reads or writes, including during the periods in which we experienced performance problems. So, it would seem that would rule out an IO problem at the SAN itself.. no? Also, several times You referred to BOTH disks, I assuming you are referring to each mirrored set within the RAID 10?, but just to be clear, we are talking about 16 spindles, not 2. So,with our reads and writes, there are more than 2 disks involved in those operations. It is my understanding that each write would involve 16 disks (8, then the mirroring on the other 8) and then each read would involve 8 disks.

Thanks for taking the time to help :)
0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 47

Expert Comment

by:dlethe
Comment Utility
10K SAS isn't that much faster.  True, a SAS-2 backplane is 6Gbit, dual pathed, but individual disks certainly have same mechanical limits.  In fact, the 10K SAS drives have pretty much the same numbers when it comes to IOPs.

15K RPM SAS disks are a step up, but 10K SAS vs 10K enterprise SATA are pretty much the same thing when it comes to the mechanical aspects.

NO, you do not have 2 spindles in operation on writes.  You have 1.  Both disks need to get written to.  The cache does some balancing, but if you are saturated with I/O requests due to underconfiguration, the benefit is lost.

Once the I/O queue is full and you can't postpone any more I/Os, then everything stops and waits.   Look at the queue depth to prove it.
0
 

Author Comment

by:USInspect
Comment Utility
Ultimately there are 2 (in our case 16) because the data must be mirrored to the other spindle. I realize that is not necessarily synchronous with the write operation.  So, considering that, 8 disks in terms of performance benefit (in total), not 1 on each write, but I think we're saying the same thing.  

Anyway, I'd love to hear from others who are using defragging software.  60% fragmentation seems very high and because our IO load hasn't changed much, I feel that is further evidence to support that it is not an underlying disk I/O issue with the SAN, because the workload and the SAN have remained relatively unchanged as we moved from a period of a well performing Hyper-V environment, to one that has growing performance problems.
0
 
LVL 47

Expert Comment

by:dlethe
Comment Utility
What I meant is that for every R1 set, you have effectively 1 spindle if your queue is full, but we are on same page.

Defragging is no big deal, just kick if off. Worst case it won't provide much gain, and will chew up bandwidth while doing the defragging.   But this treats a symptom.  Why not try to cure the problem?

Are you still using standard NTFS cluster size? If not, then bump it up (which involves backup/restore), but this will make a nice difference.  Lots of articles online about cluster sizes that tell you why, a well as disabling unnecessary NTFS behavior like generating additional I/O every time a file is even looked at to say it was looked at.   Use these techniques to ELIMINATE I/Os and they will directly translate into improved I/O performance.
0
 

Author Comment

by:USInspect
Comment Utility
Our SAN states that the volume cluster size is 512K. Our NTFS cluster block size is 4K (default). I would love to cure the problem rather than treat the symptom, I'm just trying to put my finger on the problem and have so far been unable to do so.

So, are you proposing to look at the queue depth from within each VM guest or at the host level? What is the threshold to be considered high?  There just doesn't seem to be enough information available from Microsoft on this topic. I've seen a lot of people talking about cluster sizes in forums and blogs, but in all of the official Hyper-V documentation on technet, I have yet to see any guidance in this area.
0
 

Author Comment

by:USInspect
Comment Utility
Also, I just confirmed that the array is configured as a RAID 10 with 16 x 300GB 15K SAS (not 10K as I previously stated). I was going by memory from when it was purchased 2 years ago, but I just verified that they are in fact 15K drives.
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Suggested Solutions

You might have come across a situation when you have Exchange 2013 server in two different sites (Production and DR). After adding the Database copy in ECP console it displays Database copy status unknown for the DR exchange server. Issue is strange…
Last article we focus in how to VMware: How to create and use VMs TAGs – Part 1 so before follow this article and perform the next tasks, you should read the first article how to create the TAG before using them in Veeam Backup Jobs.
How to install and configure Citrix XenApp 6.5 - Part 1. In this video tutorial we have explained step by step installation of Citrix XenApp 6.5 Server on Windows Server 2008 R2 is explained in this video. We have explained the difference between…
In this video tutorial I show you the main steps to install and configure  a VMware ESXi6.0 server. The video has my comments as text on the screen and you can pause anytime when needed. Hope this will be helpful. Verify that your hardware and BIO…

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now