Link to home
Start Free TrialLog in
Avatar of JerryJay
JerryJay

asked on

How to understand RAID disk striping size?

I am reading article talking about RAID disk drive striping, below are statement about the sizing:

-----------------------
During intense I/O operations, performance can be optimized by striping the drives in the array with stripes large enough so that each record potentially falls entirely within one stripe segment. This helps insure that data and I/O operations are evenly distributed across the arrayed drives, thus allowing each drive to work on separate I/O operations at the same time.

By contrast, in data-intensive applications that access large records, smaller stripe sizes can be used so that each record will span across many, or all, of the drives in an array with each drive storing only part of a record’s data. This allows long record accesses to be performed faster, since the data transfers can occur in parallel on multiple drives in the array. Applications such as digital video editing, audio/video on demand, imaging and data acquisition that employ long record accesses are examples of applications that often achieve optimum performance with smaller stripe sizes.
Unfortunately, smaller stripe sizes typically rule out multiple overlapping I/O operations since each I/O will typically involve all of the drives.
------------------------------

I am not quite understand the differences of configuring smaller striping size and larger striping size. after the reading, I get a feeling that smaller striping size is, it can be more easily to span across multiple disks, therefore, more efficient in I/O operation??

thanks,
jerry
ASKER CERTIFIED SOLUTION
Avatar of David
David
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of JerryJay
JerryJay

ASKER

as a newbie like me, you are bringing in a new topic to me. I searched the concepts about IOPS and throughput. it seems to me that IOPS are been talked more in environments like web, email, database systems that require frequent small size file read/write - more focusing on random read/write. but throughput are been discussed more in another kind of applications such as vedio recording system that need sequential read/write performance.
Please correct me if I got this wrong.

"They mean that "intense I/O operations" means when I/Os per second is important.  This is typically "database" or transactions per second.  "Data-intensive applications" are then throughput-intensive." --- this is much clear to me now, you examples are very helpful too.

thank Dlethe


andyalder:

thank you for your useful comment and example. "A small stripe element size is rarely useful" -- is there a measurement - how small is called small? is there a general rule/formular to calculate the right size?


thanks,
Jerry
oh, another one, regarding all RAID types from raid 0,1,2,3,4,5...  I only have practical experience on RAID 1 & 5. what about other RAID types?  are they being used as common as RAID 1 & 5? some technical docs only discuss them from technical points, but doesn't mention where / why use them. in real world, do people actually use them at all?
and I heard about JBOD, what about this one.
As a rule of thumb you make the stripe element size the same size as the database I/Os, for example Exchange uses 16K blocks to store data so a 16K stripe element size is used, twice the size is generally OK as well, Exchange runs fine on 32K stripe size, just end up reading 32K to get 16K off it.
thanks andyalder and dlethe for your detailed replies

I am not quite understand the following points in your replies:

"Optimize for one, and you decrease the other" why is that? can you please help to explain this one a bit more? see my below Exchange example, if I configure the block and stripe size to perfectly fit application needs , will I receive the best for both IOPS & throughput?

"but the best thing you can do is make the application, controller, file system, and cache settings on physical drives all pretty much agree on what they will be doing"  - dlethe

Let me use a specific Exchange server as an example, Andyalder says that Exchange server uses 16k blocks to store data, NTFS default block is 4k. To achieve the best performance, from design point, should I format my NTFS (exchange data disk only) without using default 4k block size, but set to be 16k, and set cluster stripe size to be 16k or perhaps a little bit bigger than this as andyalder previously mentioned. just for making the application, controller, file system to agree on what they will be doing?? do I get this right?

many thanks,
jerry
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thank you all, very helpful!!!