Solved

Why does my 10GB iSCSI setup seem see such high latency and how can I fix it?

Posted on 2014-07-17
11
892 Views
Last Modified: 2016-11-23
I have a iscsi server setup with the following configuration

Dell R510
Perc H700 Raid controller
Windows Server 2012 R2
Intel Ethernet X520 10Gb
12 near line SAS drives
I have tried both Starwind and the built in Server 2012 iscsi software but see similar results.  I am currently running the latest version of starwinds free iscsi server.

I have connected it to a HP 8212 10Gb port which is also connected via 10Gb to our vmware servers.  I have a dedicated vlan just for iscsi and have enabled jumbo frames on the vlan.

I frequently see very high latency on my iscsi storage.  So much so that it can timeout or hang vmware.  I am not sure why.  I can run IOmeter and get some pretty decent results.

I am trying to determine why I see such high latency 100'ms.  It doesn't seem to always happen, but several times throughout the day, vmware is complaining about the latency of the datastore.  I have a 10Gb iscsi connection between the servers.  I wouldn't expect the disks to be able to max that out.  The highest I could see when running IO meter was around 5Gb.  I also don't see much load at all on the iscsi server when I see the high latency.  It seems network related, but I am not sure what settings I could check.  The 10Gb connect should be plenty as I said and it is no where near maxing that out.

Any thoughts about any configuration changes I could make to my vmware enviroment, network card settings or any ideas on where I can troubleshoot this.  I am not able to find what is causing it.  I reference this document and for changes to my iscsi settings

http://en.community.dell.com/techcenter/extras/m/white_papers/20403565.aspx

Thank you for your time.
iometer.csv
0
Comment
Question by:gacus
  • 4
  • 3
  • 3
  • +1
11 Comments
 
LVL 117

Expert Comment

by:Andrew Hancock (VMware vExpert / EE MVE)
Comment Utility
So your "SAN" is running Starwind Software iSCSI connected to VMware ESXi ?
0
 
LVL 1

Author Comment

by:gacus
Comment Utility
yes
0
 
LVL 117

Assisted Solution

by:Andrew Hancock (VMware vExpert / EE MVE)
Andrew Hancock (VMware vExpert / EE MVE) earned 167 total points
Comment Utility
there are some specific iSCSI settings that we use, that are vendor defined, I'll dig them out tomorrow, and you could try them, these are recommended by HP and NetApp for their hardware SANs.

Have you configured multi pathing?

You may want to check which version of ESXi are you using ?
0
 
LVL 1

Author Comment

by:gacus
Comment Utility
esxi 5.5 1891313

I only have 1 10Gb connection for my iSCSI so no need for multipathing.  It should be plenty for this server.  It would be nice to have a backup, but the cost for the extra 10Gb connection make it not an option.
0
 
LVL 56

Accepted Solution

by:
Cliff Galiher earned 167 total points
Comment Utility
I know you probably won't like this answer (and, of course, you are welcome to try and find a better one), but I don't think there is much you can do in this situation.  The factors are several, all accumulating to see the behavior that you are.

First is the controller. The H700 is honestly a mid-range controller at best. The 800 series is a bit better, but for "roll your own SAN" solutions, none of the Dell solutions are very good. You really have to start considering going native like LSI if you want good performance.  Dell doesn't expect their servers to be used as SANs so they heavily optimize their drivers and caching routines for single application access...or at least on-server access. Because of how iSCSI flows, it can basically negate the entire controller cache and obviously that'll come with a performance hit.

Your second issue is the NL-SAS drives. The distinction between NL-SAS and SAS is a simple one. NL-SAS is a SATA drive with a SAS firmware bolted on. Sure, it can "understand" SAS commands, but it doesn't really do things that real SAS drives do, like queue reprioritization. A real SAS drive can take instructions from the controller and find the most optimal way to process them. An NL-SAS drive will usually do minimal or (more often) no optimization and just handle the requests in the order it received them. Which, during heavy I/O or even moderate random I/O, can add sudden latency.

iSCSI on a server certainly has a place. For archival storage, backup storage, and other uses, iSCSI on a target server is *great.* Single streams of I/O and if a failure occurs, reasonable downtime is not an issue.  But for the usual place where people want a SAN, which is the use case you are currently describing, the benefits just aren't there. Servers aren't optimized for this use, and of course the whole point of running multiple VMWare or Hyper-V nodes is to eliminate single points of failure...but with a "roll your own SAN" server, you've just kicked the can down the road to the storage being the single point of failure. That isn't particularly useful.

Truth is, given the platform you built, I think you'll just have to accept the latency. The bottleneck isn't the 10GB. It is the I/O on the target. Because your target is running Windows, you *do* have the benefit of turning on peformance monitors and counters to verify this. But I think you'll find when you do, you'll find your actual disk queues on the target are high when you see the latency warnings while your network utilization is still relatively low.

-Cliff
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 
LVL 1

Author Comment

by:gacus
Comment Utility
All I am using this SAN for is vmware backup so I don't need high performance or redundancy.  We have high end enterprise fiber channel sans for our vms.  I understand what performance I should get out of my low end drives and raid controller and I get it sometimes.  Other times I see high latency.

The issue is I don't see hardly any load on my disks during the latency issues.
0
 
LVL 117

Expert Comment

by:Andrew Hancock (VMware vExpert / EE MVE)
Comment Utility
If it's just for backup performance should be fine?

Replace the OS, and create a JBOD using a LSI SCSI HBA (no RAID) and use ZFS and a Solaris implementation and a few SSDs for ZIL and Arc cache.
0
 
LVL 47

Expert Comment

by:dlethe
Comment Utility
Your problem is most likely due to  your RAID config.  Don't tell me,  reads are OK, but writes crap out after a few secs sustained.  If that is the case it confirms it is your RAID config.    No tweaking other than going to RAID10 and smaller volumes will help.
0
 
LVL 1

Author Comment

by:gacus
Comment Utility
Thank you all for your advice!  I really appreciate it!

Very interesting dlethe.  I use r10 on all of our other systems, but since this is just for backup and I needed the space I used r6.  Can you provide more detail on what you are saying?  It does appear that it slows down overtime and reads are indeed noticeably faster.

Andrew:  I don't have the budget to add to the configuration which is why it is such a cheap configuration.  I would also lose a good chunk of space removing two of the drives.  Any thoughts on what I could do without adding anything additional?
0
 
LVL 47

Expert Comment

by:dlethe
Comment Utility
R6 is slow as heck in writing. The reason it is fast for a second or so then drops is due to the cache buffers the write, then when the cache is full (so it actually has to write to the disks), it cranks down.

If you want speed for writes, don't use RAID6. It is that simple.  Google articles about how RAID5 and RAID6 works and see for yourself.

This has nothing to do with TCP/IP or your network.  

The controller and disks don't care what your budget is, by the way ;)

If you went to solaris & used ZFS with a pair of the smallest SSDs you could find for the ZIL then you would be much better off. you could also enable compression at the filesystem level to get back some space.  Use the RAIDZ2 configuration in ZFS which is like RAID6 but better.  [As andrew suggested]

------------- OR -----------------
.. a suggestion.  [Only of your software allows this ... ] Buy two of the largest disks you can afford. Build them as a RAID1.  Modify the backup process so step one is to copy the files you want to back up to that RAID1. Then  backup from the RAID1 to your RAID6, then delete the files from the RAID1.   You will still be protected in case of  HDD Loss and have 2 levels of protection..

The RAID1 will allow backups to complete much quicker on the machines, and then the server can take it's sweet time migrating from the RAID1 internally to the RAID6 internally.    Using multiple RAID levels as storage backup pools is the textbook means of  solving this problem frugally.
0
 
LVL 47

Assisted Solution

by:dlethe
dlethe earned 166 total points
Comment Utility
Or use a staging server that has lots of disk space and a RAID10 or RAID1.   Backup there, and migrate.  Use a 2nd dual ported NIC direct attached between these two systems specifically for the backup pipe.  Bond them so you get twice the throughput.   No need to even go through a switch.    

That way backup completes much faster and your normal network bandwidth is not effected by backup from a temporary storage pool to the repository which is a less expensive and slower tier.
0

Featured Post

Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

Join & Write a Comment

Will try to explain how to use the VMware feature TAGs in the VMs and create Veeam Backup Jobs using TAGs. Since this article is too long, I will create second article for the Veeam tasks.
HOW TO: Connect to the VMware vSphere Hypervisor 6.5 (ESXi 6.5) using the vSphere (HTML5 Web) Host Client 6.5, and perform a simple configuration task of adding a new VMFS 6 datastore.
In this Micro Tutorial viewers will learn how they can get their files copied out from their unbootable system without need to use recovery services. As an example non-bootable Windows 2012R2 installation is used which has boot problems.
This tutorial will walk an individual through the process of installing the necessary services and then configuring a Windows Server 2012 system as an iSCSI target. To install the necessary roles, go to Server Manager, and select Add Roles and Featu…

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now