Hyper-V / SAN / iSCSI IOPS: What's normal?

Hey Experts,

We have been running Hyper-V in Server 2012R2 for a year now on a stand alone server and decided to move to a Windows Cluster for better reliability.  We have a SAN running RAID 10 and connecting directly into the servers with 10GB DAC cables.  We are using iSCSI for these connections.

We have a terminal server that we exported from the stand alone environment to the cluster and we are seeing a huge difference in IOPS between the two.  I realize that there are a lot configuration differences between a stand alone environment and cluster.  When running CrystalDiskMark on both, we see the following:

Stand Alone Server:
Seq Read: 1891 MB/s
512k Read: 1429 MB/s
4k Read: 32.19 MB/s
4k QD32 Read: 245.1 MB/s

Seq Write: 1964 MB/s
512k Write: 1438 MB/s
4k Write: 28.27 MB/s
4k QD32 Write: 199.4 MB/s

Cluster with SAN / iSCSI:
Seq Read: 192 MB/s
512k Read: 182.7 MB/s
4k Read: 7.946 MB/s
4k QD32 Read: 12.4 MB/s

Seq Write: 141.3 MB/s
512k Write: 136 MB/s
4k Write: 4.279 MB/s
4k QD32 Write: 4.992 MB/s

I expected a drop is IO going from a stand alone direct attached storage setup to a cluster with a SAN with 10GB connections via iSCSI but I didn't expect the SAN to be 10% of the stand alone instance.    Is that normal?

IT AdminAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Those figures are def lower than they should be.

What SAN hardware are you using?
How many hosts share the san?
how many VMs are running on it?
Is it a single RAID or more than one?
do you have MPIO enabled/configured?
does the SAN hardware have any monitoring/performance monitors you could check to confirm if theres a bottleneck or not?
cant you ping the SAN via the 10GB connections? does the ping fluctuate during your tests?
IT AdminAuthor Commented:
Hey totallytonto,

Thanks for the reply:

-We are using a Jetstor SAS 724HSD
-2 Hosts attached to the SAN
-Currently we have 4 VMs running on the Cluster
-We have the SAN separated into 3 RAID groups.  Currently all VMs are running on 1 RAID group
-Yes MPIO is enabled and configured
-The SAN has some graphing.  After watching it for a few mins I see some spikes but nothing over 500MB on what should  be a 10GB connection.  Mostly the connection is less than 100MB.
-I can ping the SAN on the 2 10GB connections.  No fluctuations.  All pings <1ms

at this point its down to finding the bottleneck.
Its normally likely to be network related on disk related.
As your pings seem good it the network side appears OK at first, but it's still worth checking into your config to be sure.
Is there a single 10G connection for each host to the SAN? If so, why do you have MPIO enabled? Can the hosts see the SAN via the data network too? is it possible traffic is flowing via the NIC and not the 10G connection?

from a disk point of view, try stopping all VMs and try a few tests to see what it's like. repeat the same test with VMs running and compare to confirm if disk bottleneck is the issue or not.
The 7 Worst Nightmares of a Sysadmin

Fear not! To defend your business’ IT systems we’re going to shine a light on the seven most sinister terrors that haunt sysadmins. That way you can be sure there’s nothing in your stack waiting to go bump in the night.

IT AdminAuthor Commented:
There are 2 connections from each server to the SAN.  The SAN has 2 controllers so each server has a 10GB connection to SAN controller 1 and each has a connection to SAN controller 2.

The SAN is only accessible via the 4 connections mentioned above as each has a separate sub net (10.0.0.x, 10.0.1.x, 10.0.2.x, 10.0.3.x) and those 4 sub nets do not match our normal network sub net (192.168.1.x).  There is 1 management port NIC on the SAN and it does connect to our normal network but it is labeled as management so I don't think there is any traffic going across it other than accessing the web portal for the SAN.

Philip ElderTechnical Architect - HA/Compute/StorageCommented:
Flip MPIO over to Least Queue Depth for all disks and try again. Round Robin in 2012 R2 can be problematic.

Make sure your network is tuned for iSCSI NIC to NIC. Jumbo Frames is one setting that can help a lot with throughput.
Cliff GaliherCommented:
As one last point, while I agree that the SAN numbers are low, the standalone numbers also seen high. I can't imagine how many disks it'd take to sustainably hold 2GB / sec read speeds (not 2Gb, but 2GB) ...it's tough to get that even from a cache for more than.a few seconds. Maybe 24+ SSD's spread across multiple 6Gb SATA channels could hit those numbers. But that's a rather unfair comparison to most SANs at that point...
IT AdminAuthor Commented:
Hey guys,

Thanks for the comments.  

I have tried switching MPIO to Least Queue Depth and I still see roughly the same speeds.
Jumbo Frames has always been enabled on both the iSCSI NICs and the SAN NICs

did you try the tests with minial VMs and the SANs activity monitoring facilities? you need to find out where the bottleneck is.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
IT AdminAuthor Commented:
totallytonto: Yes we made a discovery last night that creating a VM from scratch gave us much more favorable numbers.  It looks like something in the VM that we exported / imported is causing the slowness we are seeing.

Did you remove the old raid/chipset drivers?
IT AdminAuthor Commented:
Hey totallytonto,

We moved this server from physical to vm quite a while ago so I don't recall doing that.  But when I look at the Storage Controllers under Device Manager I only see "Microsoft Storage Spaces Controller".  I'm not seeing a driver for the PERC controller that we have on the old Dell physical server.

Anywhere else I need to look?

Philip ElderTechnical Architect - HA/Compute/StorageCommented:
Make sure any device drivers and server management applications for the previous hardware are removed.

Elevate a CMD:
set devmgr_show_nonpresent_devices=1 [Enter]
start devmgmt.msc [Enter]
View --> click on Show hidden devices

Clean up any references to previous hardware. This will include NICs, CPUs, chipset, PCI bus, and others. You may find hardware vendor management/monitoring hooks in there too that need to be cleaned out.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.