Expected Netapp IOP Performance

Dear All
I am looking for some feedback regarding benchmark testing on our new Netapp FAS2552 running 8.2RC17 Mode 7 with Vmware using Iscsi  

I have connected the Netapp to Vmware ESXi 5.5 with a 10gb connection from the SAN to the switch and from the switch to the host.  I set up iometre to  test using using 4k blocks in sequential reads with outstanding IOPs at 16. I have limited the sample size 100mb, this effectively is making a cache run from the SAN and I have confirmed the read has 100% hit rate.
.  I am surprised to see that the maximum number of iop,s is only 20,000 with less than 90 MB-s being utilised.  I would have thought if the Netapp was serving 100% from cache memory the iop,s level would be significantly higher.  
I have also benchmarked against a LUN from the Netapp directly into the Windows 2008 OS using the software initiator within the Windows operating system.  I am seeing very similar performance.  Given the fact that everything is being served from cache, does it seem right for iop,s at this level?  Your thoughts would be greatly appreciated.
Who is Participating?
Andrew Hancock (VMware vExpert / EE MVE^2)Connect With a Mentor VMware and Virtualization ConsultantCommented:
(as a point, you should be on 8.2.2. GA at least now!)

How many IOPS do you require ?

do you have jumbo frames enabled?

do you have multipath enabled?

have you applied the recommended settings to the ESXi hosts as per NetApp guidelines.

What disks do you have? how are they configured ?
James-SillettAuthor Commented:
The Filler was only delivered direct from Netapp a couple of days ago, I assumed the latest version of ONYSP  would of been installed at delivery I will look to update before I put this into production, a bit poor on Netapps part.
The required IOPs is not relevant, I am stress testing each part of the setup for benchmarking purposes. I do  not have jumbo frames on at the moment, I will when we go into production however  given the test I am doing is at 4k blocks, and the network path is less then 10% utilised I would not of thought this would of made a difference to this stress test. All guidelines have been followed from the different white papers,. The disk are again not relevant as this is a 100% read only, which is coming directly from cache, ie 100% cache (sysstat) hit almost 0% disk utilisation, this is the reason why I would expected higher IOPS. The points of this these test is to understand the limitation of each part of the setup.
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
We've had the same with us, direct delivered from NetApp without the latest builds! (always!).

We've also trashed the configuration, and set it up again, and also you may want to switch to NFS far better performance than iSCSI, as there is additional overhead, and NetApp now recommends and pushes NFS rather than iSCSI.

Same inside the Filer, NFS is lower overhead, than using iSCSI.

Do you have cache ? SSDs ? (e.g. hybrid storage, or aggregate?)
Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

James-SillettAuthor Commented:
I've been pondering NFS for a while, it just a massive change for our infrastructure but ill do some performance benchmarking and see if it make any difference. We have not got any SSD it a straight aggregate, but as per a white paper on benchmarking I kept the lun drive size below the 18GB of ECC ram so that IO meter would pull everything from cache, so I would expect to be getting the same IOPs or more than I get from my cheap £100 desktop ssd
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
When we queried NetApp support, when we benchmarked our NetApp Filers using IOP meter, and we don't bother any more, they said it would take 2 weeks before the IOPS stabilised!

because of the cache, that what was said!

Are you disks SATA or SAS ?
Benchmark testing should reflect real world situations as closely as possible.

Purely theoretical benchmarks add little value and are essentially a waste of time.

Consider the whole stack that is involved: not only the Netapp, but the networking infrastructure, the physical server, the VMWare ESX software stack and finally the OS on which you are testing.

All of the components can introduce latency and it is the latency that is limiting performance (average 0.69 ms with peaks of 118ms according to your screenshot).

Latency especially matters when using small blocks and no matter how big your bandwidth is, you can never fill up the pipe if latency is high. Both endpoints must wait until outstanding traffic is confirmed before sending more blocks.

Try testing with larger block sizes, 64K, 128K, 512K and you will see that throughput increases and IOPS decrease.

But as I said at the start, doing synthetic benchmarks are mostly a waste of time and give you no idea about real world performance.
James-SillettAuthor Commented:
I agree with what you say, however the point of the tests, are to mathematical  worked out what each component is capable of achieving, then stress test each part to make sure the mathematical performance of the COMPONET matches what is expected. Knowing what each aspect can do, ie Host, NIC, switch, SAN means that when the system is live you can monitor each of those components to see how the respond to real world.
Still you only measure the performance of the entire stack, not of the Netapp alone. Perhaps most of the latency comes from VMWare, who knows? And it is the latency that causes bottlenecks on protocol level.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.