Link to home
Start Free TrialLog in
Avatar of Dimarc67
Dimarc67Flag for United States of America

asked on

High iSCSI latency but low bandwidth usage

Single Hyper-V 2012 host server, one Windows Server 2012 VM running SQL Server 2016.
iSCSI storage is connected to host via two 1G links (MPIO).

PerfMon traces on the host server show high latency (40ms+) on iSCSI NICs, but only 20-50% bandwidth usage.

Storage system stats show similar info to host.

We're considering upgrading the SAN to 10GbE, but not sure that would improve performance we're not maxing out the current bandwidth.  Same uncertainty for adding additional links from host to storage

What other information can we gather toward identifying bottlenecks?
How can we improve iSCSI latency?
Avatar of Cliff Galiher
Cliff Galiher
Flag of United States of America image

1Gb is unsuitable for storage with VMs.  It is definitely a bottleneck.

10Gb is a *minimim* for performant iSCSI storage.  40 is better. RDMA, RoCE, etc should also be high in the list of things to look at.
40ms latency is worst than Internet... something seriously is wrong here.
My suggestion is build a physical machine that has dual Nics suitable for ISCSI ..( it can be even a workstation) and test again .
If buying is time consuming then try with single NIC but it won't utilize MPIO
Anyway latency should be <1ms at all times
Jumbo Frames?

We’ve been using 1Gbe iSCSI for years with no issues 10Gbe was not available when iSCSI was first available!!!
Avatar of Dimarc67

ASKER

Thanks for the responses guys.

Cliff G.--
Our 1G iSCSI bandwidth usage is never higher than about 60%, so it looks like the latency issue isn't bandwidth.  That said, we're certainly considering the upgrade, but it'll be $16K for the switches and NICs.  At that price, we have to know how it will actually solve the trouble before we spend it.

John T.--
Good suggestion.  We're not in a position to connect another system to the SAN, but in addition to the stand-alone Hyper-V host, we have a four-node Hyper-V cluster connected to the same SAN with dedicated LUNs for the CSV's.  We'll take a look at their NIC metrics for comparing with the SQL VM host server.

Andrew H.--
Jumbo frames are enabled on server NICs, switches, and storage.  (Good thought, though.)
Physical is physical and virtual is virtual...unless you ensure that nothing strange is with the connections we are only guessing...just pick a laptop with Gigabit nic
ASKER CERTIFIED SOLUTION
Avatar of kevinhsieh
kevinhsieh
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
We've had latency issue's on our SANs, when the storage processors, CPUs get overloaded, or are doing scrubbing or someone deletes, 1,000,000 files!
Make sure the Jumbo Frame setting is identical across the entire stack end to end.

Bandwidth and latency are mutually exclusive of each other when it comes to storage traffic.

Are there two switches involved since there are two paths? If yes, are both switches set up the same?

When latency is happening is it possible to log on to the switch(es) to check on performance?

Are the cable runs up to snuff? Are patch cables in good shape?
What type of storage (how many of what speed disks in what RAID configuration) is configured on the iSCSI target ?

What is the iSCSI target running on ?

Is the iSCSI target used by anything else ?
Thanks for all of the suggestions, everyone.  We were able to confirm with NetApp support that we're hitting the ceiling of our physical disks I/O.  They agree that bandwidth is not our bottleneck and upping to 10G won't likely change anything with our current disks.

For our usage and priorities, the better choice for us is to move the data to an SSD RAID10 installed on the local server.  It's FAR cheaper than upgrading the SAN storage, and the data doesn't need more fault tolerance than a local RAID will provide.