High iSCSI latency but low bandwidth usage

Dimarc67
Dimarc67 used Ask the Experts™
on
Single Hyper-V 2012 host server, one Windows Server 2012 VM running SQL Server 2016.
iSCSI storage is connected to host via two 1G links (MPIO).

PerfMon traces on the host server show high latency (40ms+) on iSCSI NICs, but only 20-50% bandwidth usage.

Storage system stats show similar info to host.

We're considering upgrading the SAN to 10GbE, but not sure that would improve performance we're not maxing out the current bandwidth.  Same uncertainty for adding additional links from host to storage

What other information can we gather toward identifying bottlenecks?
How can we improve iSCSI latency?
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Distinguished Expert 2018

Commented:
1Gb is unsuitable for storage with VMs.  It is definitely a bottleneck.

10Gb is a *minimim* for performant iSCSI storage.  40 is better. RDMA, RoCE, etc should also be high in the list of things to look at.
John TsioumprisSoftware & Systems Engineer

Commented:
40ms latency is worst than Internet... something seriously is wrong here.
My suggestion is build a physical machine that has dual Nics suitable for ISCSI ..( it can be even a workstation) and test again .
If buying is time consuming then try with single NIC but it won't utilize MPIO
Anyway latency should be <1ms at all times
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017

Commented:
Jumbo Frames?

We’ve been using 1Gbe iSCSI for years with no issues 10Gbe was not available when iSCSI was first available!!!
Ensure you’re charging the right price for your IT

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Author

Commented:
Thanks for the responses guys.

Cliff G.--
Our 1G iSCSI bandwidth usage is never higher than about 60%, so it looks like the latency issue isn't bandwidth.  That said, we're certainly considering the upgrade, but it'll be $16K for the switches and NICs.  At that price, we have to know how it will actually solve the trouble before we spend it.

John T.--
Good suggestion.  We're not in a position to connect another system to the SAN, but in addition to the stand-alone Hyper-V host, we have a four-node Hyper-V cluster connected to the same SAN with dedicated LUNs for the CSV's.  We'll take a look at their NIC metrics for comparing with the SQL VM host server.

Andrew H.--
Jumbo frames are enabled on server NICs, switches, and storage.  (Good thought, though.)
John TsioumprisSoftware & Systems Engineer

Commented:
Physical is physical and virtual is virtual...unless you ensure that nothing strange is with the connections we are only guessing...just pick a laptop with Gigabit nic
Network Engineer
Commented:
What's the storage array? Vendor, model, controller specs, drive number, type, and RAID configuration are all useful. If your drives are overloaded, then their latency will be really high, and the overall latency will be high. How many read and write IOs are you getting? I have seen overloaded arrays doing over 100 ms at times.

Does your array have SSD? I would pick a fast array using SSD on gigabit connections before a slower array with HDD on 10 gigabit connections for virtualization workloads most days of the week.
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017

Commented:
We've had latency issue's on our SANs, when the storage processors, CPUs get overloaded, or are doing scrubbing or someone deletes, 1,000,000 files!
Philip ElderTechnical Architect - HA/Compute/Storage

Commented:
Make sure the Jumbo Frame setting is identical across the entire stack end to end.

Bandwidth and latency are mutually exclusive of each other when it comes to storage traffic.

Are there two switches involved since there are two paths? If yes, are both switches set up the same?

When latency is happening is it possible to log on to the switch(es) to check on performance?

Are the cable runs up to snuff? Are patch cables in good shape?
What type of storage (how many of what speed disks in what RAID configuration) is configured on the iSCSI target ?

What is the iSCSI target running on ?

Is the iSCSI target used by anything else ?

Author

Commented:
Thanks for all of the suggestions, everyone.  We were able to confirm with NetApp support that we're hitting the ceiling of our physical disks I/O.  They agree that bandwidth is not our bottleneck and upping to 10G won't likely change anything with our current disks.

For our usage and priorities, the better choice for us is to move the data to an SSD RAID10 installed on the local server.  It's FAR cheaper than upgrading the SAN storage, and the data doesn't need more fault tolerance than a local RAID will provide.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial