Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium


How to understand RAID disk striping size?

Posted on 2010-09-19
Medium Priority
Last Modified: 2013-11-14
I am reading article talking about RAID disk drive striping, below are statement about the sizing:

During intense I/O operations, performance can be optimized by striping the drives in the array with stripes large enough so that each record potentially falls entirely within one stripe segment. This helps insure that data and I/O operations are evenly distributed across the arrayed drives, thus allowing each drive to work on separate I/O operations at the same time.

By contrast, in data-intensive applications that access large records, smaller stripe sizes can be used so that each record will span across many, or all, of the drives in an array with each drive storing only part of a record’s data. This allows long record accesses to be performed faster, since the data transfers can occur in parallel on multiple drives in the array. Applications such as digital video editing, audio/video on demand, imaging and data acquisition that employ long record accesses are examples of applications that often achieve optimum performance with smaller stripe sizes.
Unfortunately, smaller stripe sizes typically rule out multiple overlapping I/O operations since each I/O will typically involve all of the drives.

I am not quite understand the differences of configuring smaller striping size and larger striping size. after the reading, I get a feeling that smaller striping size is, it can be more easily to span across multiple disks, therefore, more efficient in I/O operation??

Question by:JerryJay
  • 5
  • 3
LVL 47

Accepted Solution

David earned 668 total points
ID: 33710843
Your premise is flawed to begin with.  There are 2 metrics of performance.   IOPS and throughput.   Optimize for one, and you decrease the other.  The only measurement of performance, based on the question is throughput.

So if that was the case, then the larger the block size, the greater the performance .... not true.

The author does a poor job of differentiating the two.   They mean that "intense I/O operations" means when I/Os per second is important.  This is typically "database" or transactions per second.  "Data-intensive applications" are then throughput-intensive.

When a controller does a write, it is obligated to write the entire stripe, because it has to do so in order to write parity. (This is norm with most RAID controllers).  So if you have a stripe of 64KB, and you change just one byte, your controller writes 64KB (which translates into much more than a single 64KB write assuming RAID10,1,5,6).   If you are using NTFS and took defaults, then NTFS natively writes just 4KB at a second, so you have 16x more data written than necessary.

Conversely, if you have stripe size of 4KB, and using SQL server, then SQL server swrites 64KB at a time.   So you now have opposite problem.  Raid controllers & file systems will automatically queue up and reorder I/Os to try to overcome this, but the best thing you can do is make the application, controller, file system, and cache settings on physical drives all pretty much agree on what they will be doing.

This is a simplistic response.  Proper tuning goes well beyond this, because I/O overhead and efficiency vary between reads, writes, random & sequential.   Example, most decent controllers will maintain several I/O queues on reads, so if you have RAID1, then the disk that can give you the data first responds.  You effectively have 2x read performance of RAID1.  but on a write, both disks have to be written, so you have slightly slower performance then if it was a single drive.  If you have cache and a BBU on the controller, and write-back enabled, write performance will be faster then if you just had one disk.
LVL 56

Assisted Solution

by:Handy Holder
Handy Holder earned 1332 total points
ID: 33711139
A small stripe element size is rarely useful, it makes a single large I/O quicker since it can be split across multiple disks but if there are several I/Os in the pipeline they would have to queue up since all the disks are in use for the first I/O. It's generally better to have a large stripe element size so that multiple I/Os can be performed in parallel.

Using the digital image example that the editor gives us above you can imagine cutting a photo into 10 slices and getting 10 people to stick one slice each into their album; since it takes time to apply the paste to the back this is quicker than one person pasting the whole photo into a single album. Conversely if there are 10 photos to stick it's quicker not to cut the photos up but give one each to your people to stick in their photo albums.

It's the same argument between RAID3 and RAID5, every disk is tied up in a single I/O for RAID3 whereas only one (or 4 for write) are tied up in a single RAID5 I/O, so http://bytepile.com/RAID_3_vs_RAID_5_In_HPC.php is well worth reading, just think of RAID 3 in that example as any RAID with a small stripe element size and RAID 5 being any RAID with a large stripe element size.

>I get a feeling that smaller striping size is, it can be more easily to span across multiple disks, therefore, more efficient in I/O operation??

The smaller the stripe size is the more easily a single I/O will tie up all your disks. Is that efficient?


Author Comment

ID: 33711255
as a newbie like me, you are bringing in a new topic to me. I searched the concepts about IOPS and throughput. it seems to me that IOPS are been talked more in environments like web, email, database systems that require frequent small size file read/write - more focusing on random read/write. but throughput are been discussed more in another kind of applications such as vedio recording system that need sequential read/write performance.
Please correct me if I got this wrong.

"They mean that "intense I/O operations" means when I/Os per second is important.  This is typically "database" or transactions per second.  "Data-intensive applications" are then throughput-intensive." --- this is much clear to me now, you examples are very helpful too.

thank Dlethe

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.


Author Comment

ID: 33711330

thank you for your useful comment and example. "A small stripe element size is rarely useful" -- is there a measurement - how small is called small? is there a general rule/formular to calculate the right size?


Author Comment

ID: 33711378
oh, another one, regarding all RAID types from raid 0,1,2,3,4,5...  I only have practical experience on RAID 1 & 5. what about other RAID types?  are they being used as common as RAID 1 & 5? some technical docs only discuss them from technical points, but doesn't mention where / why use them. in real world, do people actually use them at all?
and I heard about JBOD, what about this one.
LVL 56

Expert Comment

by:Handy Holder
ID: 33711580
As a rule of thumb you make the stripe element size the same size as the database I/Os, for example Exchange uses 16K blocks to store data so a 16K stripe element size is used, twice the size is generally OK as well, Exchange runs fine on 32K stripe size, just end up reading 32K to get 16K off it.

Author Comment

ID: 33714526
thanks andyalder and dlethe for your detailed replies

I am not quite understand the following points in your replies:

"Optimize for one, and you decrease the other" why is that? can you please help to explain this one a bit more? see my below Exchange example, if I configure the block and stripe size to perfectly fit application needs , will I receive the best for both IOPS & throughput?

"but the best thing you can do is make the application, controller, file system, and cache settings on physical drives all pretty much agree on what they will be doing"  - dlethe

Let me use a specific Exchange server as an example, Andyalder says that Exchange server uses 16k blocks to store data, NTFS default block is 4k. To achieve the best performance, from design point, should I format my NTFS (exchange data disk only) without using default 4k block size, but set to be 16k, and set cluster stripe size to be 16k or perhaps a little bit bigger than this as andyalder previously mentioned. just for making the application, controller, file system to agree on what they will be doing?? do I get this right?

many thanks,
LVL 56

Assisted Solution

by:Handy Holder
Handy Holder earned 1332 total points
ID: 33715812
My bad, exchange uses 32K blocks, not 16K, principle is the same though. You might as well set the cluster size to 32k as well as the raid stripe size assuming it was exchange, but it's not so important since Windows doesn't ask for just one cluster, wait for it and then ask for the next, it asks for all of them at the same time. Cluster size is more to do with the mniimum amount of space used to store a file and how many entries have to be in the file allocation table (the index for what's where on the disk).

Author Closing Comment

ID: 33759841
Thank you all, very helpful!!!

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When we purchase storage, we typically are advertised storage of 500GB, 1TB, 2TB and so on. However, when you actually install it into your computer, your 500GB HDD will actually show up as 465GB. Why? It has to do with the way people and computers…
Among the most obnoxious of Exchange errors is error 1216 – Attached Database Mismatch error of the Jet Database Engine. When faced with this error, users may have to suffer from mailbox inaccessibility and in worst situations, permanent data loss.
This Micro Tutorial will teach you how to reformat your flash drive. Sometimes your flash drive may have issues carrying files so this will completely restore it to manufacturing settings. Make sure to backup all files before reformatting. This w…
Despite its rising prevalence in the business world, "the cloud" is still misunderstood. Some companies still believe common misconceptions about lack of security in cloud solutions and many misuses of cloud storage options still occur every day. …
Suggested Courses
Course of the Month10 days, 1 hour left to enroll

569 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question