# IOPS question

If I have 15K or 10K SCSI drives, how to find the IOPS for it? These are 300 GB drives.
###### Who is Participating?

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Commented:
15K drives generate 180 IOPS while 10K drives generate 120 IOPS. The total IOPS per drive increases as the queue depth increases to about a maximum of 2 to 2.5 times the base perofrmance, so a 15K drive can deliver up to around 450 IOPS before response time goes completely crazy. YMMV, though - the maximum IOPS varies on workload and I/O size and so on, but work on those base figures and you'll be right.

Experts Exchange Solution brought to you by

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Commented:
FWIW - you can work it out from the rotational speed and seek times:

15,000 rpm / 60 = 250 rps

One full rotation = 4mS (1/250*1000)
Assume average I/O time is half a rotation + average seek + settle time, so:

2mS + 2.5mS + 1mS = 5.5mS per I/O
Number of 5.5mS i?o in 1 second = 1/0.0055 = 180 IOPS
Author Commented:
Thank you very much for the exaplanation. I'm new to this. Can you give little bit more info on

Average Latency
Rotational Latency
settle time

We need to order more disks by tomorrow for our projects.
Author Commented:
Don't worry about it. I'm googling it now and I will get the info.
Commented:
Average Latency
You add half a rotation to average seek and settle time and you have the average latency. The real latency may be close to zero - that is; teh next bit of data to be read or written is about to fly under the disk heads. On the other hand, the next bit of data might be at the absolute inside track, and you're at the outside (track 0), so you have to seek all the way in to the center of the drive and wait for up to one full rotation for the data to come under the heads.

>Rotational Latency
Time taken for one revolution. 15,000rpm = 250 revolutions per second, or one revolution in 4mS

>settle time
Once the heads get to the right track, they have to settle - that is; find the absolute center of the track (and it's tiny). If there's a bit of vibration caused by other nearby disks seeking, or someone shouting at the disks (strange but true), the vibration will send the heads off-track slightly so the disk has to correct the movement.

Once you start looking into how disks actually work, you realise it's a wonder the dam' things work at all. A 15K disk is a triumph of engineering over physics.

Author Commented:
meyersed,

How did you come with this calculation ?

One full rotation = 4mS (1/250*1000) ??
Commented:
1/250  = 0.004 seconds, so x 1000 = 4milliseconds.
Author Commented:
Thank you, how did you come up with  average seek-2.5 ms  and  settle time=1ms
Commented:
You'll find those specs on most enterprise storage drives - take a look at this one for example: http://www.seagate.com/www/en-us/products/enterprise-hard-drives/cheetah-15k#tTabContentSpecifications

Quoted seek time often included settle time - depends on the drive manufacturer.

Interestingly, most drive manufacturers have stopped publishing anything but the broadest specs on personal storage drives.
Author Commented:
If we are putting 5 drivers into RAID5 group, how much penalty will be there? How much IOPS will get out of RAID5 configuration?
Commented:
5 x 180 for 15K drives = 900 IOPS. Note that, depending on the workload, that figure will peak to about 2250 IOPS, but at that load, response time will be truly awful and application performance will be suffering. The write cache on the RAID controller will help buffer the write workload - but that depends on the RAID controller. More expensive RAID controllers have more write cache and smart algorithms to improve write performance - you get what you pay for...

My apologies for the delay in my response - I'm on the other side of the Pacific....

Commented:
If we are putting 5 drivers into RAID5 group, how much penalty will be there?
- RAID5 and 50 have a write penalty of 4. RAID 10 has a write penalty of 2.

How much IOPS will get out of RAID5 configuration?
- I want to expand a bit on what Meyersd has written, because this a common question, but it's a bit putting the cart before the horse. Meyersd is right if you have 5 drives each capable of producing a sustanied 180 IOPS. Your total IOPS would NOT be 180 x 5 = 900. This is, because it does not take into account other factors such as disk controller cache, etc., so for the sake of the discussion, if the total sustained IOPS capable of being produced is 1200, is that then your total IOPS? The answer is no. The reason is the generation of IOPS is largely based on what your applications demand is, not what is the maximum capable of being provided. Your storage solution will never always provided it's maximum, and if it did it's life span would not be long. Keep in mind too any time you push 85% of the maximum IOPS performance goes in the toilet. A better question to ask is what is the total number IOPS needed, and is RAID5 the best RAID type for my requirements?

I would venture to say most SMB's have a Read/Wite ratio of some where in the neighborhood of 80/20. With that in mind RAID5 is a suitable consideration, because it favor's read over writes. This is also true with RAID50. RAID50 has a higher parity calculation requirement though, which will result in a slightly lower IOPS capability, but you do get the added safety of double drive failure. Provided they are not in the same parity set. RAID10 has a low write penalty, but to get that you have to sacrifice a lot of disk space so unless you have a high write demand I wouldn't use it. Going back to the question of IOPS you need to find out what your requirements are (application demand) because, you will need to adjust the number of hard disk drives accordingly. Remember you don't want to exceed that 85% mark and you need to account for future growth, so go big if you can afford it.
Commented:
>but you do get the added safety of double drive failure.
I have to argue the point on that one - if you get a a double drive failure in the same RAID 5 set, then you lose all your data. You are correct, though, that RAID 50 will sustain a double drive failure providing the drive failure is in separate RAID 5 sets. The same is true for RAID 1/0 - if you lose both halves of the same mirror pair, you're toast, but the risk is considerably lower.

With modern 450GB and 600GB 15K drives, RAID 1/0 is often a better choice:

Assume a workload of 8000 IOPS, 75% read, 25% write, 8K block size, 100% random.

Workload at the disks for RAID 1/0:
6000 read + (2000 x 2) write = 10,000 IOPS.
Assume 180 IOPS per 15K FC drive:
10000/180 = 55 drives to handle the workload. Round up to 56 as RAID 1/0 requires even numbered RAID sets.

Workload at the disks for RAID 5:
6000 read + (2000 x 4) write = 14,000 IOPS.
Assume 180 IOPS per 15K FC drive:
14000/180 = 78 drives to handle the same workload.

So you can see that RAID 1/0 requires fewer drives for the same workload, and providing your space reqiurement is fulfilled, RAID 1/0 is a better choice.

Of course, Enterprise Flash Drives change the equation. You'd only need 6 drives to handle the same workload. EFDs are a ton more expensive, but you need a whole heap fewer drives, so EFDs will cost you less - especially if you calculate in the lower power and cooling costs.

EMC's FAST on V-MAX arraays is the business for EFDs: it moves 'hot' blocks into EFD, not so hot into FC, and data that isn't being accessed so much into SATA automatically, so you get the best possible bang for your buck, and you need a whole heap fewer drives.
Commented:
All that is true, but very costly and based on the original question I seriously doubt he does.

If were discussing vendored solutions like EMC or Equallogic I bet for the majority of SMB's they would reccomend RAID5. EMC utilizes a hot spare pool and you allocate the drives you want into the pool. EMC doens't offer RAID50 and Equallogic automatically dedicates two drives when you select RAID5 so you still you still get benefit of double drive failure. Equallogic does offer RAID50 and the only reason I can think of it may have a faster rebuild time on drive failure.

With EMC you need to also take into consideration that the number of storage processors is fixed. On an EMC Clariion CX3 series you get two. That means for every DAS (disk array) you add the overall performance goes down, becuase the bottleneck is on the storage processor. That makes RAID10 a very expensive option with EMC. With Equallogic the number of storage controllers scales up with every array you add and each storage controller contains it's own dedicated 4GB of cache making RAID10 a much more viable option with Equallogic, but still not exactly necessary.

Outside of a vendored solution I agree RAID10 would suit better if you can afford it.
Commented:
>All that is true, but very costly and based on the original question I seriously doubt he does.
The point I was trying to get across is that although RAID 1/0 has poor space efficiency compared to a parity RAID schema, it can provide a cheaper alternative than RAID 5 as it needs fewer spindles to provide the performance.

>EMC doens't offer RAID50
They do - they just don't call it that. A striped metaLUN is a RAD 50 construct if built on RAID 5 groups.

>With EMC you need to also take into consideration that the number of storage processors is fixed. On an EMC Clariion CX3 series you get two. That means for every DAS (disk array) you add the overall performance goes down, becuase the bottleneck is on the storage processor. That makes RAID10 a very expensive option with EMC. With Equallogic the number of storage controllers scales up with every array you add and each storage controller contains it's own dedicated 4GB of cache making RAID10 a much more viable option with Equallogic, but still not exactly necessary.

DAS is Direct Attached Storage. I believe you may mean DAE - Disk Array Enclosure. Yes - the number of storage processors is fixed at 2 per CLARiiON CX series array, but the storage processors can handle up to 120, 240, 480 or 960 drives depending on the model. It doesn't need to add processing power for an additional rack of disks - it has ample. Your assertion that the overall performance goes down as you add disks is incorrect - performance scales linearly with the number of drives up to the maximum number of drives in the array.

In short, the choice of RAID type should be determined by business requirements and workload. RAID 1/0 works best for highly reandom workloads, and withteh large disks available, space requirements shouldn't be a deterrent.

BTW - do you work for Dell?
Commented:
lol ... No I don't work for Dell, but I have worked for a few small IT consulting companies for SMB's where Dell is the dominant vendor.

I will concede that my information on the EMC storage processors is a bit dated and the SP's may not have the same bottleneck issue they once did. However, I believe my assertion is largely correct that RAID10 is an expensive proposition and RAID5 shouldn't be discounted. I do disagree though that the RAID type should be determined by the business requirement. It should be determined by the IOPS requirement governed by application demand, future growth predictions, and cost.
Commented:
You can have business requirements drive storage type if you use an ITIL-type storage catalogue and charge business units or departments according to the tier of storage they use:

Tier 1: RAID 1/0 FC, 15K
Tier 2: RAID 5 FC 15K
Tier 3: RAID 5 FC 10K
Tier 4: RAID 6 SATA.

Then you can have a self-funding IT department and, as an added bonus, if app owners have o pay for what they use, they don't make silly demands for performance they never use.
Commented:
Content 'R' us...  :-)
Author Commented:
Thank you for all of your comments and  I'm going through now. We have NS20 EMC storage and our manager might have bought more DISKS yesterday and will give you the latest Disks info. I'm in the process of designing now. If I have any questions, let you know guys. Thanks again.
Author Commented:
meyersd,

Thank you. I understand the calculation and we may go with RAID5 configurtioon and RAID1 configuration. Is there are cache  on the DISK too? For the following calculation how do I add the controller cache to get the corrent iops? thanks in advance.

>Workload at the disks for RAID 5:
6000 read + (2000 x 4) write = 14,000 IOPS.
Assume 180 IOPS per 15K FC drive:
14000/180 = 78 drives to handle the same workload.
Commented:
>Is there are cache  on the DISK too?
yes - but it is disabled by default on an NS20 and you can't enable it. CX4- based arrays enable the on-disk cache and do some cleverness to guarantee data integrity.

BTW - create the RAID 1 set as a 2 disk RAID 1/0 set - then it's easy to expand by adding more drives. A RAID 1 set can only ever have 2 drives which can be a bit limiting.

All writes go via write cache (unless it's a large sequential write of more than 2MB), and the CLARiiON that underpins the NS20 allocates the write cache as it is required. Burst speed to cache can exceed 100,000 IOPS no sweat - after all, you're writing to memory via a high speed, low latency protocol. When you're calculating performance, always work on the disk IOPS  - the write cache is there to improve performance and to absorb peaks in the workload. You need to consider what happens if there aren't enough disks to absorb the workload - eventually, write cache fills and you're in a world of pain. With your 5 disks, you'll get 900 IOPS day in, day out and the array won't even break a sweat. You can rely on getting 1500 IOPS and still get excellent performance. Push things to 2000 IOPS and you'll see response times start to climb. The point is, you've got a fair degree of headroom, so you can keep an eye on the workload and plan for additional drives if you need them. The NS20 has virtual LUN migration so you can move a LUN live to new drives with zero downtime if you need to, or you can use a striped metaLUN to add some more spindles and space to the existing LUN and get more performance that way.

The NS20 is good stuff!
Author Commented:
Thanks a loot,  how do see where it is providing sequential performance or random performance? Do we need to configure some thing? If so, what performance would you prefer?
Author Commented:
Also, when we calculate IOPS we don't need to check the disk size? In order to get certain IOPS we need more disks, but when it comes to space we may not be using them. Are we going to waste it? What is the connection between disk size and IOPS?
Commented:
>sequential performance or random performance
That's determined by the host and what its doing. A database tends to generate random I/O, as does Microsoft Exchange (the new version, 2010, is likely to be an exception to that rule). VMware generates a very highly random I/O pattern as many virtual machines are accessing the underlying disk. Backup to disk and media serving generate sequential I/O patterns, while file serving tends to be a mix of both. You can see the I/O sizes in Navisphere Analyzer on the underlying CLARiiON. Once I/O sizes hit 64K or larger, they're considered to be sequential. Most SAN environments tend to be a mix of the two, with most I/O patterns being random with a light sprinkling of sequential to keep things interesting.

>Also, when we calculate IOPS we don't need to check the disk size?
No - spin speed is what's important. The size doesn't have any effect on the performance, with one exception: if you're building a SAN layout that is attempting to extract every last little teensy-weensy bit of performance, you can 'short-stroke' the drives. That means that you only use a part of each disk as then you can limit the maximum seek distance (obviously, the heads moving across only a third of the total disk surface will take less time than the heads moving across the whole of the disk). Performance is also best on the outside of the disk platters compared to the centre. Short stroking is not a technique you'd normally use - it's expensive for one, as you only use a part of the capacity of each drive, and you really would be looking for every last bit of performance.

One important thing to do, though, is make absolutely sure that you set the alignment offset as appropriate for your OS. For Windows and Intel-based Linux, it's 64K. Here's a good article as to why: http://clariionblogs.blogspot.com/2008/03/setting-alignment-offset-on-esx-server.html
Author Commented:
meyersd,

You are a genious in storage and thanks a lot.  Last question in the iops calculation you put  8K block size, how did you come up with 8K ?
Commented:
Thanks! You've very kind.  :-)

VMware writes in 8K blocks. Windows writes in 4K blocks for small volumes, then larger block sizes as the volumes increase in size. SQL tends to write 4K blocks, too. The CLARiiON that is part of your NS20 uses 8K cache page sizes. An 8K block size indicates random I/O, whereas a larger block size of 64K or greater is treated as sequential I/O.

I do a lot of analyses of CLARiiON workloads using Navisphere Analyzer as part of my job (I'm a Solutions Architect for an EMC partner) - most environments tend to present highly random workloads with an average block size of 8K. Read to write ratio is almost always between 25 to 50% writes, with apps like Exchange at the 25% end of the spectrum, so that's handy for doing array performance calculations.

One thing I keep seeing is that organisations tend to outgrow the number of drives they have especially when they've virtualised heavily, because its just so easy to add another VM - no-one thinks about what's going on underneath the covers. Eventually, you just run out of spindles to handle the workload. The solution is simple - buy more disk!  :-)
Commented: