Why does my 10GB iSCSI setup seem see such high latency and how can I fix it?

Posted on 2014-07-17
Last Modified: 2016-11-23
I have a iscsi server setup with the following configuration

Dell R510
Perc H700 Raid controller
Windows Server 2012 R2
Intel Ethernet X520 10Gb
12 near line SAS drives
I have tried both Starwind and the built in Server 2012 iscsi software but see similar results.  I am currently running the latest version of starwinds free iscsi server.

I have connected it to a HP 8212 10Gb port which is also connected via 10Gb to our vmware servers.  I have a dedicated vlan just for iscsi and have enabled jumbo frames on the vlan.

I frequently see very high latency on my iscsi storage.  So much so that it can timeout or hang vmware.  I am not sure why.  I can run IOmeter and get some pretty decent results.

I am trying to determine why I see such high latency 100'ms.  It doesn't seem to always happen, but several times throughout the day, vmware is complaining about the latency of the datastore.  I have a 10Gb iscsi connection between the servers.  I wouldn't expect the disks to be able to max that out.  The highest I could see when running IO meter was around 5Gb.  I also don't see much load at all on the iscsi server when I see the high latency.  It seems network related, but I am not sure what settings I could check.  The 10Gb connect should be plenty as I said and it is no where near maxing that out.

Any thoughts about any configuration changes I could make to my vmware enviroment, network card settings or any ideas on where I can troubleshoot this.  I am not able to find what is causing it.  I reference this document and for changes to my iscsi settings

Thank you for your time.
Question by:gacus
  • 4
  • 3
  • 3
  • +1
LVL 119
ID: 40202720
So your "SAN" is running Starwind Software iSCSI connected to VMware ESXi ?

Author Comment

ID: 40202982
LVL 119

Assisted Solution

by:Andrew Hancock (VMware vExpert / EE MVE^2)
Andrew Hancock (VMware vExpert / EE MVE^2) earned 167 total points
ID: 40203026
there are some specific iSCSI settings that we use, that are vendor defined, I'll dig them out tomorrow, and you could try them, these are recommended by HP and NetApp for their hardware SANs.

Have you configured multi pathing?

You may want to check which version of ESXi are you using ?
Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.


Author Comment

ID: 40203036
esxi 5.5 1891313

I only have 1 10Gb connection for my iSCSI so no need for multipathing.  It should be plenty for this server.  It would be nice to have a backup, but the cost for the extra 10Gb connection make it not an option.
LVL 57

Accepted Solution

Cliff Galiher earned 167 total points
ID: 40203442
I know you probably won't like this answer (and, of course, you are welcome to try and find a better one), but I don't think there is much you can do in this situation.  The factors are several, all accumulating to see the behavior that you are.

First is the controller. The H700 is honestly a mid-range controller at best. The 800 series is a bit better, but for "roll your own SAN" solutions, none of the Dell solutions are very good. You really have to start considering going native like LSI if you want good performance.  Dell doesn't expect their servers to be used as SANs so they heavily optimize their drivers and caching routines for single application access...or at least on-server access. Because of how iSCSI flows, it can basically negate the entire controller cache and obviously that'll come with a performance hit.

Your second issue is the NL-SAS drives. The distinction between NL-SAS and SAS is a simple one. NL-SAS is a SATA drive with a SAS firmware bolted on. Sure, it can "understand" SAS commands, but it doesn't really do things that real SAS drives do, like queue reprioritization. A real SAS drive can take instructions from the controller and find the most optimal way to process them. An NL-SAS drive will usually do minimal or (more often) no optimization and just handle the requests in the order it received them. Which, during heavy I/O or even moderate random I/O, can add sudden latency.

iSCSI on a server certainly has a place. For archival storage, backup storage, and other uses, iSCSI on a target server is *great.* Single streams of I/O and if a failure occurs, reasonable downtime is not an issue.  But for the usual place where people want a SAN, which is the use case you are currently describing, the benefits just aren't there. Servers aren't optimized for this use, and of course the whole point of running multiple VMWare or Hyper-V nodes is to eliminate single points of failure...but with a "roll your own SAN" server, you've just kicked the can down the road to the storage being the single point of failure. That isn't particularly useful.

Truth is, given the platform you built, I think you'll just have to accept the latency. The bottleneck isn't the 10GB. It is the I/O on the target. Because your target is running Windows, you *do* have the benefit of turning on peformance monitors and counters to verify this. But I think you'll find when you do, you'll find your actual disk queues on the target are high when you see the latency warnings while your network utilization is still relatively low.


Author Comment

ID: 40203480
All I am using this SAN for is vmware backup so I don't need high performance or redundancy.  We have high end enterprise fiber channel sans for our vms.  I understand what performance I should get out of my low end drives and raid controller and I get it sometimes.  Other times I see high latency.

The issue is I don't see hardly any load on my disks during the latency issues.
LVL 119
ID: 40203502
If it's just for backup performance should be fine?

Replace the OS, and create a JBOD using a LSI SCSI HBA (no RAID) and use ZFS and a Solaris implementation and a few SSDs for ZIL and Arc cache.
LVL 47

Expert Comment

ID: 40203775
Your problem is most likely due to  your RAID config.  Don't tell me,  reads are OK, but writes crap out after a few secs sustained.  If that is the case it confirms it is your RAID config.    No tweaking other than going to RAID10 and smaller volumes will help.

Author Comment

ID: 40204388
Thank you all for your advice!  I really appreciate it!

Very interesting dlethe.  I use r10 on all of our other systems, but since this is just for backup and I needed the space I used r6.  Can you provide more detail on what you are saying?  It does appear that it slows down overtime and reads are indeed noticeably faster.

Andrew:  I don't have the budget to add to the configuration which is why it is such a cheap configuration.  I would also lose a good chunk of space removing two of the drives.  Any thoughts on what I could do without adding anything additional?
LVL 47

Expert Comment

ID: 40204415
R6 is slow as heck in writing. The reason it is fast for a second or so then drops is due to the cache buffers the write, then when the cache is full (so it actually has to write to the disks), it cranks down.

If you want speed for writes, don't use RAID6. It is that simple.  Google articles about how RAID5 and RAID6 works and see for yourself.

This has nothing to do with TCP/IP or your network.  

The controller and disks don't care what your budget is, by the way ;)

If you went to solaris & used ZFS with a pair of the smallest SSDs you could find for the ZIL then you would be much better off. you could also enable compression at the filesystem level to get back some space.  Use the RAIDZ2 configuration in ZFS which is like RAID6 but better.  [As andrew suggested]

------------- OR -----------------
.. a suggestion.  [Only of your software allows this ... ] Buy two of the largest disks you can afford. Build them as a RAID1.  Modify the backup process so step one is to copy the files you want to back up to that RAID1. Then  backup from the RAID1 to your RAID6, then delete the files from the RAID1.   You will still be protected in case of  HDD Loss and have 2 levels of protection..

The RAID1 will allow backups to complete much quicker on the machines, and then the server can take it's sweet time migrating from the RAID1 internally to the RAID6 internally.    Using multiple RAID levels as storage backup pools is the textbook means of  solving this problem frugally.
LVL 47

Assisted Solution

dlethe earned 166 total points
ID: 40204437
Or use a staging server that has lots of disk space and a RAID10 or RAID1.   Backup there, and migrate.  Use a 2nd dual ported NIC direct attached between these two systems specifically for the backup pipe.  Bond them so you get twice the throughput.   No need to even go through a switch.    

That way backup completes much faster and your normal network bandwidth is not effected by backup from a temporary storage pool to the repository which is a less expensive and slower tier.

Featured Post

Problems using Powershell and Active Directory?

Managing Active Directory does not always have to be complicated.  If you are spending more time trying instead of doing, then it's time to look at something else. For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
exporting a VM fails in win 2012R2 5 25
Uniden UDW20055 3 32
Cannot access RDP (AD 2012) 6 22
Dell  iDRAC remote 3 13
The business world is becoming increasingly integrated with tech. It’s not just for a select few anymore — but what about if you have a small business? It may be easier than you think to integrate technology into your small business, and it’s likely…
Many businesses neglect disaster recovery and treat it as an after-thought. I can tell you first hand that data will be lost, hard drives die, servers will be hacked, and careless (or malicious) employees can ruin your data.
Teach the user how to join ESXi hosts to Active Directory domains Open vSphere Client: Join ESXi host to AD domain: Verify ESXi computer account in AD: Configure permissions for domain user in ESXi: Test domain user login to ESXi host:
This tutorial will walk an individual through the steps necessary to join and promote the first Windows Server 2012 domain controller into an Active Directory environment running on Windows Server 2008. Determine the location of the FSMO roles by lo…

821 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question