I've made you some graphics for a better understanding how RAID works.
First of all, there are two ways a raid can be generated:
- By hardware
- By software
What does that mean?
This means: If you have a hardware RAID controller, there will be a card inside your PC/server that will manage everything that handles I/O to your disks. There are no drivers needed. The Operating system will see the "raid set" as one big disk.
A software raid means: your operating system will handle the distribution or mirroring to the disks. It is OS dependant. That means: Your OS has to support that implementation of RAID. You can't simply switch from Windows to Linux or otherwise. And: your CPU will be involved in calculating the parts that are used for redundandy, called "parity bits". I'll come to that later on.
So, what are the RAID levels?
The simplest thing you can do is just to distribute all of your data to every disk that's available. This is called "RAID 0". It is referred to as a "stripeset" since the data is striped to every available disk to the set of disks that build up the raid.
This is very performant, but: If one disk fails, it's like you take a pumpgun and shoot to your filesystem. It's simply gone. Unrecoverable.
This is what a RAID 0 looks like:
So you might want to have some redundancy? No problem. Here's RAID 1. It mirrors your data to another disk. That means: Every block you write to your RAID gets written to two separate physical disks, of which one may fail. There's no problem for you. Your data will be intact.
This costs no measurable performance at all when it's done by a hardware raid controller and very little to nothing if it's done by the operating system itself (software raid).
Here's how it looks:
You can combine both of these worlds to create something that's called "RAID 10". Which simply means: "Mirror a stripeset".
Here's how it looks:
It simply mirrors every disk in a stripeset.
How many disks may fail? => Depends!
If, in the worst possible case" the two mirrored disks fail, everyhing's gone. If, in the best case, one entire stripeset fails, everything's good. The advantage of this solution is: Best performance! Since mirroring takes much less effort to the controller (hardware raid) or the operating system (software raid) than calculating "parities". (More on that later)
Now we come to the slightly more complicated part:
Paritys! Eeeh? What's that?!
It's simple: Let's assume you have five disks and want to be immune to a failure of one disk. Then you create a RAID 3 (or RAID 5 - I'll come to that later on). Both approaches are: "One disk may fail". This can be easily explained when looking at RAID 3:
In this case, you distribute all of the data to four disks (of your five available ones). But you do some magic, too: Calculating parities. This calculations are either be made by your hardware raid controller or by your operating system (and therefore your CPU, whichs costs CPU cycles and performance!)
So, what's "parity"?
Parity is a astonishingly simple set of only one rule: Count every Bits (1 or 0) in a row that are "1" and decide: Is the number I've counted odd or even?
This picture explains it:
This is either be done by your controller (hardware raid) or by your operating system (software raid).
This (RAID 3) is the worst possible RAID level!
The writing speed entirely depends on how fast you can write the parity to your dedicated "parity-disk".
Therefore they invented RAID 5:
RAID 5: distributes the parity over all available disks.
Sounds like voodoo to you? Here's a picture to explain it clearly:
So, people decided: I want more redundancy! I want to be able to handle the failure of two disks!
No problem, here's where RAID 6 kicks in: It calculates the parity in two different ways and distributes the parity to all available disks.
This is how it looks: