asked on

Hyper-V / SAN / iSCSI IOPS: What's normal?

Hey Experts,

We have been running Hyper-V in Server 2012R2 for a year now on a stand alone server and decided to move to a Windows Cluster for better reliability. We have a SAN running RAID 10 and connecting directly into the servers with 10GB DAC cables. We are using iSCSI for these connections.

We have a terminal server that we exported from the stand alone environment to the cluster and we are seeing a huge difference in IOPS between the two. I realize that there are a lot configuration differences between a stand alone environment and cluster. When running CrystalDiskMark on both, we see the following:

Stand Alone Server:
Seq Read: 1891 MB/s
512k Read: 1429 MB/s
4k Read: 32.19 MB/s
4k QD32 Read: 245.1 MB/s

Seq Write: 1964 MB/s
512k Write: 1438 MB/s
4k Write: 28.27 MB/s
4k QD32 Write: 199.4 MB/s

Cluster with SAN / iSCSI:
Seq Read: 192 MB/s
512k Read: 182.7 MB/s
4k Read: 7.946 MB/s
4k QD32 Read: 12.4 MB/s

Seq Write: 141.3 MB/s
512k Write: 136 MB/s
4k Write: 4.279 MB/s
4k QD32 Write: 4.992 MB/s

I expected a drop is IO going from a stand alone direct attached storage setup to a cluster with a SAN with 10GB connections via iSCSI but I didn't expect the SAN to be 10% of the stand alone instance. Is that normal?

Thanks!

Steve

Those figures are def lower than they should be.

What SAN hardware are you using?
How many hosts share the san?
how many VMs are running on it?
Is it a single RAID or more than one?
do you have MPIO enabled/configured?
does the SAN hardware have any monitoring/performance monitors you could check to confirm if theres a bottleneck or not?
cant you ping the SAN via the 10GB connections? does the ping fluctuate during your tests?

IT Admin

ASKER

Hey totallytonto,

Thanks for the reply:

-We are using a Jetstor SAS 724HSD
-2 Hosts attached to the SAN
-Currently we have 4 VMs running on the Cluster
-We have the SAN separated into 3 RAID groups. Currently all VMs are running on 1 RAID group
-Yes MPIO is enabled and configured
-The SAN has some graphing. After watching it for a few mins I see some spikes but nothing over 500MB on what should be a 10GB connection. Mostly the connection is less than 100MB.
-I can ping the SAN on the 2 10GB connections. No fluctuations. All pings <1ms

Thanks

Steve

at this point its down to finding the bottleneck.
Its normally likely to be network related on disk related.
As your pings seem good it the network side appears OK at first, but it's still worth checking into your config to be sure.
Is there a single 10G connection for each host to the SAN? If so, why do you have MPIO enabled? Can the hosts see the SAN via the data network too? is it possible traffic is flowing via the NIC and not the 10G connection?

from a disk point of view, try stopping all VMs and try a few tests to see what it's like. repeat the same test with VMs running and compare to confirm if disk bottleneck is the issue or not.

IT Admin

ASKER

There are 2 connections from each server to the SAN. The SAN has 2 controllers so each server has a 10GB connection to SAN controller 1 and each has a connection to SAN controller 2.

The SAN is only accessible via the 4 connections mentioned above as each has a separate sub net (10.0.0.x, 10.0.1.x, 10.0.2.x, 10.0.3.x) and those 4 sub nets do not match our normal network sub net (192.168.1.x). There is 1 management port NIC on the SAN and it does connect to our normal network but it is labeled as management so I don't think there is any traffic going across it other than accessing the web portal for the SAN.

Thanks!

Philip Elder

Flip MPIO over to Least Queue Depth for all disks and try again. Round Robin in 2012 R2 can be problematic.

Make sure your network is tuned for iSCSI NIC to NIC. Jumbo Frames is one setting that can help a lot with throughput.

Cliff Galiher

As one last point, while I agree that the SAN numbers are low, the standalone numbers also seen high. I can't imagine how many disks it'd take to sustainably hold 2GB / sec read speeds (not 2Gb, but 2GB) ...it's tough to get that even from a cache for more than.a few seconds. Maybe 24+ SSD's spread across multiple 6Gb SATA channels could hit those numbers. But that's a rather unfair comparison to most SANs at that point...

IT Admin

ASKER

Hey guys,

Thanks for the comments.

I have tried switching MPIO to Least Queue Depth and I still see roughly the same speeds.
Jumbo Frames has always been enabled on both the iSCSI NICs and the SAN NICs

Thanks

ASKER CERTIFIED SOLUTION

Steve

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

IT Admin

ASKER

totallytonto: Yes we made a discovery last night that creating a VM from scratch gave us much more favorable numbers. It looks like something in the VM that we exported / imported is causing the slowness we are seeing.

Thanks

Steve

Did you remove the old raid/chipset drivers?

IT Admin

ASKER

Hey totallytonto,

We moved this server from physical to vm quite a while ago so I don't recall doing that. But when I look at the Storage Controllers under Device Manager I only see "Microsoft Storage Spaces Controller". I'm not seeing a driver for the PERC controller that we have on the old Dell physical server.

Anywhere else I need to look?

Thanks

Philip Elder

Make sure any device drivers and server management applications for the previous hardware are removed.

Elevate a CMD:
set devmgr_show_nonpresent_devices=1 [Enter]
start devmgmt.msc [Enter]
View --> click on Show hidden devices

Clean up any references to previous hardware. This will include NICs, CPUs, chipset, PCI bus, and others. You may find hardware vendor management/monitoring hooks in there too that need to be cleaned out.