Link to home
Start Free TrialLog in
Avatar of AlexC77
AlexC77

asked on

What is faster - Raid1 or Raid5?

Got a big database (approximately 20Gb). Software allows to split this database  between several logical/physical drives.
I have  6 SCSI drives.

Thinking about 2 ways organizing that:

1) Create 3 Raid1 arrays out of my 6 hard drives. Split the database between those 3 arrays.
2) Create 2 Raid5 arrays out of my 6 hard drives. Split the database between those 2 arrays.


Need help choosing the best way out of this two.
The goal is to have the fastest possible way to access database from clients.
Any calculations or link to page about how to calculate the total speed for each variant will be the best help.

Thanks.
ASKER CERTIFIED SOLUTION
Avatar of jdlambert1
jdlambert1
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I might mention that RAID 0 is the fastest, but it provides no fault tolerance -- if one drive fails, it's toast. Some folks, if they can afford it, set up two sets of RAID 0, then mirror one to the other to get the best performance and some fault tolerance, but the SCSI adapter has to support that configuration, and they charge more for that kind...
Avatar of AlexC77
AlexC77

ASKER

SCSI controller is not an issue, I can get any.
I need some examples with calculation if possible.
Oh, and just because you have RAID 5 and the system will keep working if a drive dies, it doesn't mean you can safely ignore the drives. I was called in once and they couldn't understand why their system quit, because they "paid all that money" for a fault tolerant system. Well, the first drive failed months before and no one noticed. When the second one failed and the system quit, it was too late. Not surprisingly, these folks didn't have recent backups, either.

So, someone should check the "idiot lights" every day, or, if the drives and controller card/drivers support it, you can configure many systems to send an email alert when a drive fails.

HTH
Avatar of AlexC77

ASKER

This not a problem also. One of the techs  checking everything every day
Oh, one other thing. If you mirror two 100GB drives, you'll have 100GB of usable space -- you lose half for the fault tolerance wihtout gaining any performance. If you gang three 100GB drives in RAID 5, you'll have 200GB of usable space -- you only lose 1/3 for the fault tolerance while increasing performance significantly.
Avatar of AlexC77

ASKER

Thank you for that explanation, but I am familiar with different RAID options, and I know how to calculate space.
I need to know if there is any way to calculate the speed.
Thanks.
I don't know of any sites that provide statistics. That would vary a *lot* with every combination of drive brand, drive brand, SCSI controller brand/model, and configuration.

Since you said you could get any controller... If you haven't gotten the drives yet, you can consider getting a 3ware controller (www.3ware.com) and using less expensive IDE drives. Not only is it less expensive, they give superior performance -- they even provide a performance boost on RAID 1, because they made their controller smart enough to read half from one of the mirrored drives, while the other half is being read from the other mirrored drive. SCSI doesn't do that, it just reads from one of the mirrored drives unless that drive fails, then it reads from the other one.

3ware was established by Adaptec engineers who were unhappy with the lack of inteligent innovation at Adaptec, so they left and started their own company. They're doing a great job designing their systems. So why aren't other companies using the same great techniques? Patents!
I've been poking around, between posts, and still haven't found any web sites with useful stats. The only thing I've found at all only says how to calculate throughput by dividing the amount of data read or written by the amount of time it takes -- duh.

I'm a little surprised that some of the hardware vendors don't seem to have useful stats in comparison charts for marketing purposes, but I guess it's just because there are too many variables.
Avatar of AlexC77

ASKER

Can you post URL you found here?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
On a theorical view and with no limitation on the bus
RAID 1
- you read 2x faster than with just one disk because read can be done simultaneous from the first and from the second disk
- you write at the same speed than with one disk (you write simultaneous the complete information on two disks)

RAID 5
- you read 2x faster than with just one disk because you read half info from two disks
- you write 2x faster than with just one disk because you write half info on the three disks of your system

So in a performance point of view, RAID 5 is more powerful, BUT, all the operations of read / write need to be computed to reconstruct / split information so the quality of your RAID 5 card is very important !

On a fault tolerance point of view, nothing is better than RAID 1 : you do not need any other thing than one disk to recover all your datas, you can take a disk, put it on another server and it will work... Specialists in data recovery from disk after big crashes do all agree with that :
- if you have 2 disks that are broken (this is very frequent since disk failure are consequences of electrical chocs that are the same for all your system) and you must send your disks to a specialist to recover the datas, this is more simple with two identical disks (the sectors you cannot find on one are recover from the second) than with three complementary disks from a RAID 5 system where each byte is splited on two disks.

My recomendation is
- to have the best performances, choose RAID 5
- if you have a fault tolerance problematic, choose RAID 1

Julien.
http://www.acnc.com/04_01_50.html

for details on different raid levels
Avatar of The--Captain
>- to have the best performances, choose RAID 5
>- if you have a fault tolerance problematic, choose RAID 1

AFAIK, RAID 0 combines drives into a single logical unit, but can write to all contained drives (almost) simultaneously.  Thus you can have a 160G array comprised of 4 40G drives which works at almost 4x the speed of a similar 160G drive.  AFAIK, RAID 1 combines drives into a single logical unit, but writes the same data to all drives in the array (with not too much of performance hit for writing to additional drives).  Thus, you could have a 40G array comprised of 4 40G drives, which would give you triple data redundancy at almost the same speed of a single similar 40G drive.  AFAIK, RAID 5 combines these features (where does hot-swap figure in all this, BTW?), giving you the speed benefits of RAID 0 but also the redundancy of RAID 1 (but of course you need even more drives).  The speed of the array (if it's comprised of decent components) should be roughly the base speed of the component drives (since they should all be of approximately the same size and speed) multiplied by the number of striped drives (not he mirrored drives) in the array.

Hope that helps...

Cheers,
-Jon
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
RAID-1 will be about 50% faster than RAID-5 doing reads regardless of size

There are, in fact, two variations of RAID-1, RAID 0+1 and RAID 1+0. Without going into too much detail here (this is going to wind up being a long answer), RAID 0+1 involves creating a RAID-0 disk stripe first, and then mirroring that stripe's contents to an identical stripe. In RAID 1+0, the disks being allocated are mirrored in pairs first and then those pairs are striped. Looks and sounds similar but there are some serious technical differences. The bottom line is that 1+0 is better than 0+1.

RAID-5 is a form of parity RAID, where data is striped across all the disks in the RAID stripe (the collection of disks that make up RAID-5), plus one extra disk. This extra disk contains calculated values that are generated by applying Boolean arithmetic to all of the data on the other disks. (What I have just described is actually RAID-4. RAID-5 works the same way, except that it takes that parity data and stripes it across all of the disks in the RAID-5 stripe to improve performance.) Any lost disk in a RAID-5 stripe can be recovered through the use of the parity information.

RAID-5 writes slower than RAID-1 for several reasons, including all of the arithmetic that must be done every time a write is generated. What's more, in order to do the calculations, in some cases, data must be read from all the disks so that the calculations can be made. RAID-1 does not require any math or extra reading. The rule of thumb is that if your disks are going to do less than about 15-20% writing, then RAID-5 may be OK. Any more than that, and you should probably not do RAID-5.

When a disk is lost, and isn't that why you are looking at RAID in the first place, replacing a disk in RAID 1+0 requires copying all the data from the surviving copy of the failed disk onto the replacement disk. In RAID-5, all the data on all the disks must be read and the appropriate calculations made, before the data can be written to the replacement disk.

If you lose two disks at the same time in RAID-5, all the data is lost, and must be recovered from backup tapes. If you lose two disks in RAID 1+0, unless they happen to be both sides of the same mirror, the system will be able to recover your data without having to resort to backup tapes.

Please note that if your disks are in a hardware array, then the performance comparisons are likely invalid, since those arrays generally cache their data, hiding those performance issues
If speed is the only concern - Use RAID 0. Striping data over several disks is by far the best option for sheer performance. BUT - it has NO FAULT TOLERANCE!!! Dies one disk - then your data is gone.

If speed is a requirement and security is wanted - use RAID 10 (Explanation at http://www.acnc.com/04_01_50.html) - High speed and fair reliability. It can in some cases survive multiple drive failure. Drawback: Very expensive / High overhead / Tricky to scale.

If overhead and taking advantage of the disks is a issue - stick to good old reliable RAID 5. Speed is essentially decided by the disks, so its their output (and the controller of course) that will be the limitation. Put some monitoring software on the array and buy 1 or 2 replacement disks to have on the shelf.

If performance drops below a desired level you have a legit reason to request funds for a new toy! ;-)
Just to back others up, the sums are quite easy, RAID 1 does 2 physical I/O per logical write, RAID 5 does 4 physical I/Os per logical write, RAID 6 or whatever with double parity does 6.

Example, 100 I/O per second for each disk, 33% write, 66% read, 6 drives and assuming the controller is intelligent enough to read off either disk in RAID 1 :-

RAID 5,  600 / (.66 * 1 + .33 * 4) = 300
RAID 1,  600 / (.66 * 1 + .33 * 2) = 450
RAID 0,  600 / (.66 * 1 + .33 * 1) = 600
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
If you use 6 U320 15k RPM drives in any configuration, the bottleneck will likely be the PCI bus.  Many server mainboards have multiple PCI busses.  I think it would make sense to have 2 RAID 5 (or 3 RAID 0 if you don't need fault tolerance) arrays with each on it's own PCI bus and controlled by it's own 64 (not 32) bit PCI controller card.  Without knowing the exact specs of everything from the drive (RPM, avg seek, SCSI/SATA, etc) to the controller to the mainboard to even the mainboard chipset, it's not really possible to give an exact answer.  All the suggestions above make sense so far...but for a more specific recommendation we'd need more specific information on the system.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
http://www.tweakers.net/reviews/432 shows the effects of PCI bus bandwidth bottleneck w/ benchmarks
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
RAID 0 and RAID 5 both support striped reads, which will give best "read" performance.  (As far as I can see, an intelligent controller could do this with RAID 1 as well, but I know of nobody who makes a controller that implements this....)

Because RAID 5 needs to calculate and store the redundancy information, database writes will be slower than with RAID 0 or 1.

So you need to determine whether reads outnumber writes by enough to cover the write penalty with RAID 5.  This is common, but may or may not be true in your particular case.

". SCSI doesn't do that, it just reads from one of the mirrored drives unless that drive fails, then it reads from the other one."

Did I read that right???  Hm, I haven't seen a SCSI controller that works like that...