• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 3925
  • Last Modified:

Disk Read Bytes per sec, IOPS, and Disk reads per sec ?????

We are int he process of determining if we need a new san or not.  A tool was provided to me by netapp to measure the performance of our disk system.

Currently we have an Dell EMC AX100.

The specifications state that it can do

30,000 I/Os and 150MB per second w/ 11 7200 rpm sata drives

OK, so I ran this tool from netapp. It produces some graphs that show

X1 = Disk reads Per Sec
X2 = Disk Read bytes Per Sec
Y = Time

Their graph shows it topping out at with random reads at

130.00 Disk Bytes Per Sec
1600000.00 Disk Read bytes per sec

With a average of

40 - 60 Disk Bytes Per Sec
500000 - 700000 Disk Read bytes per sec

Now I am aware that Dell's AX100 numbers are on paper and do not represent live numbers.

But I'm missing how 30,000 I/Os and 150MB per second corresponds to Disk ready bytes and Disk read per sec.

in closing we believe we're maxing out the disk performance of our san, but want to be clear on the numbers before I suggest a replacement.

Thanks in advance.

  • 2
  • 2
2 Solutions
the big picture is that benchmarks measure artificial loads that rarely represent YOUR combination of large and small block IOPs and throughput workloads per host and lun.

you need to approach the problem by determoning YOUR IO requitements.  all os have utilities to report Io Utilization.  use them. learn what you require, then learn where the bottlenecks are.

your problem is you are lookinng at solutions wo understanding what you need. it is like buying a vehicle based on horsepower ... without bothering to check whether you need a sports car, truck, or airplane.

kblackwelAuthor Commented:
Dear dlethe,

Then I have to ask,

I don't know if there's a way to determine how much IOPS we require for our situation.

We use our SAN for folder redirection of user profiles. I've been over this a few times and no one can tell me what our IOPS requirement would be. If you know a way to measure that with different users using different applications in a remote desktop environment, PLEASE let me know.

What I think I really have to work with now is determining where the bottle neck is.

Out San is connected to a Windows 2003 File server machine with a 2 gig fibre to the san and shares the directories through a bonded 2 gig Ethernet connection.

I'm attempting to put together numbers to verify this, but I can see that

Network bandwidth is being saturated during heavy morning hours when everyone logs in and pulls their profile and redirected folders.

The 2 gig fibre connection is also getting saturated during these hours.

So now I think all I can do is determine if we're bouncing up against the limits of our san and if so expand.

Am I incorrect in this thinking?
Mark WillsTopic AdvisorCommented:
Yep, pretty much - but it is all in "how" you determine that you are "bouncing against the limits"

Difficulty with SAN is measuring the right thing... Need to consider a lot of different things in a SAN configuration such as types of RAID being used, caching etc...

There is a good insight into the types of measures for Disk IO from the MS website - now it does say for Win Server 2000 - but dont worry about that as much as what the measures are, how they can help, and their analysis. Bit of reading, but very worthwhile (despite winserver 2000). http://technet.microsoft.com/en-us/library/cc938964.aspx

Also have a look a the different measures - note the "transfers" which is what you need to compare with the manufacturer supplied data : http://technet.microsoft.com/en-us/library/cc776376(WS.10).aspx

So, there are a few things you will need to consider when trying to measure disk performance, not just read bytes / second, but transfers, Queue length etc... You will also need to undersatand you disk service consumers - are they random access, are they largely sequential, and then configurations / LUN how have they been seperated - is load evenely distributed across a number of spindles etc... which is kinda what dlethe was saying above...

Also, check for any SAN whitepapers for throughput and "best practices" there is often some additional information "out there" (e.g. google it). But based on the disks being 7200rpm disks, there are choices depending on straight throughput performance, or cost benefit. For example would be using 15Krpm drives in a mulit raid array. but, all of that type of discussion is kind of academic and is really based on your business requirement (and budget).

There is a white paper for SQL which discusses elements of IO performance as well, while a lot of it is with regard to SQL, there are some general "benchmarks" that can be used for comparative purposes : http://msdn.microsoft.com/en-us/library/cc966540.aspx#EFAA

Main points of interest from that document are :

You can use the following performance counters to identify I/O bottlenecks. Note, these AVG values tend to be skewed (to the low side) if you have an infrequent collection interval. For example, it is hard to tell the nature of an I/O spike with 60-second snapshots. Also, you should not rely on one counter to determine a bottleneck; look for multiple counters to cross check the validity of your findings.

PhysicalDisk Object: Avg. Disk Queue Length represents the average number of physical read and write requests that were queued on the selected physical disk during the sampling period. If your I/O system is overloaded, more read/write operations will be waiting. If your disk queue length frequently exceeds a value of 2 during peak usage of SQL Server, then you might have an I/O bottleneck.

Avg. Disk Sec/Read is the average time, in seconds, of a read of data from the disk. Any number

Less than 10 ms - very good
Between 10 - 20 ms - okay
Between 20 - 50 ms - slow, needs attention
Greater than 50 ms – Serious I/O bottleneck
Avg. Disk Sec/Write is the average time, in seconds, of a write of data to the disk. Please refer to the guideline in the previous bullet.

Physical Disk: %Disk Time is the percentage of elapsed time that the selected disk drive was busy servicing read or write requests. A general guideline is that if this value is greater than 50 percent, it represents an I/O bottleneck.

Avg. Disk Reads/Sec is the rate of read operations on the disk. You need to make sure that this number is less than 85 percent of the disk capacity. The disk access time increases exponentially beyond 85 percent capacity.

Avg. Disk Writes/Sec is the rate of write operations on the disk. Make sure that this number is less than 85 percent of the disk capacity. The disk access time increases exponentially beyond 85 percent capacity.

When using above counters, you may need to adjust the values for RAID configurations using the following formulas.

Raid 0 -- I/Os per disk = (reads + writes) / number of disks
Raid 1 -- I/Os per disk = [reads + (2 * writes)] / 2
Raid 5 -- I/Os per disk = [reads + (4 * writes)] / number of disks
Raid 10 -- I/Os per disk = [reads + (2 * writes)] / number of disks
For example, you have a RAID-1 system with two physical disks with the following values of the counters.

Disk Reads/sec            80
Disk Writes/sec           70
Avg. Disk Queue Length    5
In that case, you are encountering (80 + (2 * 70))/2 = 110 I/Os per disk and your disk queue length = 5/2 = 2.5 which indicates a border line I/O bottleneck.

Great article mark, it is a good balance between over simplification, and enough information to get your point across.  I think a caveat is due in the IOPs calculations for RAID.  Depending in the hardware implementation, stripe size, cache buffers, cache settings, queue depth, workload, RAID level, whether read or write, and I/O request size, then the physical disk I/Os can be profoundly different.  

Simple example, on RAID1 reads, most engines would do one I/O from whatever disk can service the request sooner.  On writes, 2 I/Os have to get done, but if writeback is enabled, the calling program returns immediately, so could appear to be zero I/Os unless system is loaded.  Or it could be 1 I/O, and it acknowledges only after one disk writes, or with write through cache enabled, the cost for a write is 2 IOs.

If you have 4 sequential writes, and I/O request size is optimized, then you could end up with only ONE I/O per disk.   Conversely, if the chunk size is incorrect, then 1 host I/O could just as easily require 4 I/Os per disk.

So if RAID is part of the equation, you MUST look at how it is set up beyond raid level.
Still it is good enough and a great tutorial.    
Mark WillsTopic AdvisorCommented:
Thanks dlethe, and absolutely agree with your added comments above (along with your opening post as well)...

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: C++ 11 Fundamentals

This course will introduce you to C++ 11 and teach you about syntax fundamentals.

  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now