Solved

Why does my 10GB iSCSI setup seem see such high latency and how can I fix it?

Posted on 2014-07-17
11
934 Views
Last Modified: 2016-11-23
I have a iscsi server setup with the following configuration

Dell R510
Perc H700 Raid controller
Windows Server 2012 R2
Intel Ethernet X520 10Gb
12 near line SAS drives
I have tried both Starwind and the built in Server 2012 iscsi software but see similar results.  I am currently running the latest version of starwinds free iscsi server.

I have connected it to a HP 8212 10Gb port which is also connected via 10Gb to our vmware servers.  I have a dedicated vlan just for iscsi and have enabled jumbo frames on the vlan.

I frequently see very high latency on my iscsi storage.  So much so that it can timeout or hang vmware.  I am not sure why.  I can run IOmeter and get some pretty decent results.

I am trying to determine why I see such high latency 100'ms.  It doesn't seem to always happen, but several times throughout the day, vmware is complaining about the latency of the datastore.  I have a 10Gb iscsi connection between the servers.  I wouldn't expect the disks to be able to max that out.  The highest I could see when running IO meter was around 5Gb.  I also don't see much load at all on the iscsi server when I see the high latency.  It seems network related, but I am not sure what settings I could check.  The 10Gb connect should be plenty as I said and it is no where near maxing that out.

Any thoughts about any configuration changes I could make to my vmware enviroment, network card settings or any ideas on where I can troubleshoot this.  I am not able to find what is causing it.  I reference this document and for changes to my iscsi settings

http://en.community.dell.com/techcenter/extras/m/white_papers/20403565.aspx

Thank you for your time.
iometer.csv
0
Comment
Question by:gacus
  • 4
  • 3
  • 3
  • +1
11 Comments
 
LVL 118
ID: 40202720
So your "SAN" is running Starwind Software iSCSI connected to VMware ESXi ?
0
 
LVL 1

Author Comment

by:gacus
ID: 40202982
yes
0
 
LVL 118

Assisted Solution

by:Andrew Hancock (VMware vExpert / EE MVE)
Andrew Hancock (VMware vExpert / EE MVE) earned 167 total points
ID: 40203026
there are some specific iSCSI settings that we use, that are vendor defined, I'll dig them out tomorrow, and you could try them, these are recommended by HP and NetApp for their hardware SANs.

Have you configured multi pathing?

You may want to check which version of ESXi are you using ?
0
 
LVL 1

Author Comment

by:gacus
ID: 40203036
esxi 5.5 1891313

I only have 1 10Gb connection for my iSCSI so no need for multipathing.  It should be plenty for this server.  It would be nice to have a backup, but the cost for the extra 10Gb connection make it not an option.
0
 
LVL 56

Accepted Solution

by:
Cliff Galiher earned 167 total points
ID: 40203442
I know you probably won't like this answer (and, of course, you are welcome to try and find a better one), but I don't think there is much you can do in this situation.  The factors are several, all accumulating to see the behavior that you are.

First is the controller. The H700 is honestly a mid-range controller at best. The 800 series is a bit better, but for "roll your own SAN" solutions, none of the Dell solutions are very good. You really have to start considering going native like LSI if you want good performance.  Dell doesn't expect their servers to be used as SANs so they heavily optimize their drivers and caching routines for single application access...or at least on-server access. Because of how iSCSI flows, it can basically negate the entire controller cache and obviously that'll come with a performance hit.

Your second issue is the NL-SAS drives. The distinction between NL-SAS and SAS is a simple one. NL-SAS is a SATA drive with a SAS firmware bolted on. Sure, it can "understand" SAS commands, but it doesn't really do things that real SAS drives do, like queue reprioritization. A real SAS drive can take instructions from the controller and find the most optimal way to process them. An NL-SAS drive will usually do minimal or (more often) no optimization and just handle the requests in the order it received them. Which, during heavy I/O or even moderate random I/O, can add sudden latency.

iSCSI on a server certainly has a place. For archival storage, backup storage, and other uses, iSCSI on a target server is *great.* Single streams of I/O and if a failure occurs, reasonable downtime is not an issue.  But for the usual place where people want a SAN, which is the use case you are currently describing, the benefits just aren't there. Servers aren't optimized for this use, and of course the whole point of running multiple VMWare or Hyper-V nodes is to eliminate single points of failure...but with a "roll your own SAN" server, you've just kicked the can down the road to the storage being the single point of failure. That isn't particularly useful.

Truth is, given the platform you built, I think you'll just have to accept the latency. The bottleneck isn't the 10GB. It is the I/O on the target. Because your target is running Windows, you *do* have the benefit of turning on peformance monitors and counters to verify this. But I think you'll find when you do, you'll find your actual disk queues on the target are high when you see the latency warnings while your network utilization is still relatively low.

-Cliff
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 1

Author Comment

by:gacus
ID: 40203480
All I am using this SAN for is vmware backup so I don't need high performance or redundancy.  We have high end enterprise fiber channel sans for our vms.  I understand what performance I should get out of my low end drives and raid controller and I get it sometimes.  Other times I see high latency.

The issue is I don't see hardly any load on my disks during the latency issues.
0
 
LVL 118
ID: 40203502
If it's just for backup performance should be fine?

Replace the OS, and create a JBOD using a LSI SCSI HBA (no RAID) and use ZFS and a Solaris implementation and a few SSDs for ZIL and Arc cache.
0
 
LVL 47

Expert Comment

by:dlethe
ID: 40203775
Your problem is most likely due to  your RAID config.  Don't tell me,  reads are OK, but writes crap out after a few secs sustained.  If that is the case it confirms it is your RAID config.    No tweaking other than going to RAID10 and smaller volumes will help.
0
 
LVL 1

Author Comment

by:gacus
ID: 40204388
Thank you all for your advice!  I really appreciate it!

Very interesting dlethe.  I use r10 on all of our other systems, but since this is just for backup and I needed the space I used r6.  Can you provide more detail on what you are saying?  It does appear that it slows down overtime and reads are indeed noticeably faster.

Andrew:  I don't have the budget to add to the configuration which is why it is such a cheap configuration.  I would also lose a good chunk of space removing two of the drives.  Any thoughts on what I could do without adding anything additional?
0
 
LVL 47

Expert Comment

by:dlethe
ID: 40204415
R6 is slow as heck in writing. The reason it is fast for a second or so then drops is due to the cache buffers the write, then when the cache is full (so it actually has to write to the disks), it cranks down.

If you want speed for writes, don't use RAID6. It is that simple.  Google articles about how RAID5 and RAID6 works and see for yourself.

This has nothing to do with TCP/IP or your network.  

The controller and disks don't care what your budget is, by the way ;)

If you went to solaris & used ZFS with a pair of the smallest SSDs you could find for the ZIL then you would be much better off. you could also enable compression at the filesystem level to get back some space.  Use the RAIDZ2 configuration in ZFS which is like RAID6 but better.  [As andrew suggested]

------------- OR -----------------
.. a suggestion.  [Only of your software allows this ... ] Buy two of the largest disks you can afford. Build them as a RAID1.  Modify the backup process so step one is to copy the files you want to back up to that RAID1. Then  backup from the RAID1 to your RAID6, then delete the files from the RAID1.   You will still be protected in case of  HDD Loss and have 2 levels of protection..

The RAID1 will allow backups to complete much quicker on the machines, and then the server can take it's sweet time migrating from the RAID1 internally to the RAID6 internally.    Using multiple RAID levels as storage backup pools is the textbook means of  solving this problem frugally.
0
 
LVL 47

Assisted Solution

by:dlethe
dlethe earned 166 total points
ID: 40204437
Or use a staging server that has lots of disk space and a RAID10 or RAID1.   Backup there, and migrate.  Use a 2nd dual ported NIC direct attached between these two systems specifically for the backup pipe.  Bond them so you get twice the throughput.   No need to even go through a switch.    

That way backup completes much faster and your normal network bandwidth is not effected by backup from a temporary storage pool to the repository which is a less expensive and slower tier.
0

Featured Post

Give your grad a cloud of their own!

With up to 8TB of storage, give your favorite graduate their own personal cloud to centralize all their photos, videos and music in one safe place. They can save, sync and share all their stuff, and automatic photo backup helps free up space on their smartphone and tablet.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this article, I show you step by step with screenshots to assist you - HOW TO: Deploy and Install the VMware vCenter Server Appliance 6.5 (VCSA 6.5), with some helpful tips along the way.
This article aims to explain the working of CircularLogArchiver. This tool was designed to solve the buildup of log file in cases where systems do not support circular logging or where circular logging is not enabled
This tutorial will walk an individual through the process of configuring their Windows Server 2012 domain controller to synchronize its time with a trusted, external resource. Use Google, Bing, or other preferred search engine to locate trusted NTP …
This Micro Tutorial walks you through using a remote console to access a server and install ESXi 5.1. This example is showing remote access and installation using a Dell server. The hypervisor is the very first component of your virtual infrastructu…

867 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now