Link to home
Start Free TrialLog in
Avatar of neil4933
neil4933

asked on

RAID and application behaviour

Hi

I know that there is RAID 5, RAID 1 etc, and some RAIDs are better for some types of applicaton behaviour than others. For instance, RAID 1 is supposedly better for transactional logging because of the write behaviour.

Does anyone have a list of the different RAID types for Windows servers and which application behavior they suit best?
Avatar of PaulColuccio
PaulColuccio

Technically there are three:

1. Striped, which is like RAID 0. Best IOPS, but no redundency. No disk space loss.
2. Mirrored, complete redundency, but loss of IOPS. Mirrored you loose 50% of total disk space.
3. RAID 5, striped with parity. Requires 3 or more drives. Better IOPS then standard disk, but less then striped. The parity makes you loose about 30% of disk space. Usually the best choice for space and speed.
Avatar of David
RAID0 is not necessarily best in IOPS.  I can easily configure a RAID1 to outperform a RAID0 in IOPs, as long as the I/Os are reads. If those I/Os are writes, and random, instead of sequential, then RAID1 will generally outperform many RAID0s, unless the block size gets to a certain point.

The moral in above, is that there are variables, and simply choosing a raid level as the only differentiator probably isn't good enough.

RAID5 can and will run slower than even a single disk drive on writes, if your controller doesn't have a write cache, or the cache is saturated.  But on pure reads, a 3-disk RAID5 in a perfect world could be 2-3X faster then a single disk drive (unless, that disk drive is using software RAID-1 and you are doing reads).

Change controllers (including certain software-raid and hardware raid controllers, and /or get into a degraded mode, and rules change more)

How about this.. what do you have and what are you trying to do?
Generally speaking more spindles = more speed.

SQL server and Exchange have their own tweaks concerning how to set up RAID for their apps.

http://www.msexchange.org/articles_tutorials/exchange-server-2010/high-availability-recovery/raid-considerations-exchange-2010.html
"Generally speaking more spindles = more speed."
Unless spindles = 0, then you'll have the most speed of all. ;)

(Apologies to the experts, as I noted, I'm on a soapbox against generalizing raid level as the sole differentiator for decisions based on performance, with intent to demonstrate that RAID level is maybe 25% of the suitability puzzle)
Avatar of neil4933

ASKER

Hello.

>> How about this.. what do you have and what are you trying to do?

I don't have anything yet ;-) We are due to onboard a few new applications in several months and I was interested in learning how different RAID levels were suitable for different application behaviours.

I know that some people say RAID5 is best for everything, but I' ve heard that if the app behaviour is sequential wites then one type of RAID is better, if the app is sequential reads, then another, if the app is random writes, then another etc etc.

How about I give a few examples?

A) Sequential writes
B) Sequential reads
C) Random writes
D) Random reads

Also, if parts of the app wanted to carry out a mixure of the above, are we better off seperating them into seperate RAID drives?
OK, big picture on generalities, but from perspective of server and database, and typical transactional load, maybe 90% read; localized indexes; and expectation that down time is expensive enough to plan against it.

1. Use a RAID1 for the O/S; index files; swap; logs; scratch files.  Use SAS technology.
2. If you have "high" amount of writes and lots of blobs (images, PDFs, etc..) then go RAID6, but only if you have a great controller with battery backup, and SAS disks.   RAID6 has twice the parity (protection against disk or block failure then RAID5, so from 10,000' view, if you lose a disk with RAID5 you are screwed if you have a bad block. But with RAID6 you can still lose a disk.   Performance hit might be 15% on a $1000+ controller over RAID5.

The RAID6 is ONLY for the non-index part of the database.  The idea is the index files go on the fastest drive, so you avoid expensive seeks, and the computer goes exactly to where the data is, so I/O kept at a minimum on the R6.

If you want LOTS of speed,and extra data protection, find a controller that uses 3-way RAID1 or a 3-disk RAID6.  That means you have protection against 2 drive failures, plus since the data is always at 3 different places, then it will absolutely scream on performance (on reads). On writes, it will be about the same speed as a single disk.

I would do this much differently if you had under 1 TB or over 10TB, or had an unusual type of database.

You can help make a better decision by modeling what you have today.  If this database is in operation on another Windows server, then use perfmon to measure the I/O you are currently doing to add a sense of scale.

Remember also that databases like RAM, and aren't generally CPU hogs. So you're probably throwing your money away by buying more than 2-4 cores, and that money should be spent on I/O and/or RAM.
If your data requirements are not too big or you have a large budget, you can use SSD's to get some very high performance for your applications.  A single SSD will outperform a single hard drive; you can also configure them in RAID arrays as well.  You can also set up a hybrid solution with SSD's for frequently accessed data (logs) and hard drives for the rest.
HI Guys

This isn't really for any set application, it's more to help me with my understanding of RAID technology.

I guess what I was looking for was an answer to the below questions:


A) Sequential writes
B) Sequential reads
C) Random writes
D) Random reads

In such as way that:

a) Sequential writes -RAID1 is generally better for sequential writes (e.g. log drives) BECAUSE....

and so on...is this possible?
ASKER CERTIFIED SOLUTION
Avatar of David
David
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks dlethe, that's exactly what I was looking for...

Just one thing though - is it possible to explain your conclusions? Just so I understand?
Not really.

In order to explain the generalization, one has to understand all the variables that affect any generalization.  That is way to much work, and at one time or another I probably have gone into every one of these points in depth.

Suffice to say that even performance itself has to be looked at by I/Os per second, throughput, number of simultaneous transactions, the number of bytes of each transactions,  if the data is at the beginning or end of the disk, if I/Os are split between drives that have the same data, is it in the cache or not (many different caches) ... so many more

Way too much.  that is why you get a generalization w/o details.  I've probably gone into depth on every one of these aspects in previous threads.  But no way can I or will I get into all of them.