# How does RAID 5 work? The Shortest and Easiest explanation ever!

Published:
We all have limited time to study long and complicated information about RAID theories, but you may be interested as to how RAID 5 works. We made it simple for you by providing the shortest and easiest explanation ever.

First we need to remind you XOR definition:

XOR function result is equal 1 if both arguments are different.

XOR (0, 1) = 1
XOR (1, 0) = 1

XOR function output is equal 0 if both arguments are same.

XOR (0, 0) = 0
XOR (1, 1) = 0

Now let us assume we have 3 drives with the following bits:

| 101 | 010 | 011 |

And we calculate XOR of those data and place it on 4th drive

XOR (101, 010, 011) = 100     (XOR (101,010) = 111 and then XOR (111, 011) = 100

So the data on the four drives looks like this below:

| 101 | 010 | 011 | 100 |

Now let’s see how the XOR MAGIC works. Let’s assume the second drive has failed. When we calculate XOR all the remaining data will be present from the missing drive.

| 101 | (010) | 011 | 100 |

XOR (101, 011, 100) = 010

You can check the missing other drives and XOR of the remaining data will always give you exactly the data of your missing drive.

| 101 | 010 | (011) | 100 |

XOR (101, 010, 100) = 011

What works for 3 bits and 4 drives only, works for any number of bits and any number of drives. Real RAID 5 has the most common stripe size of 64k (65536 * 8 = 524288 bits). So the real XOR engine only needs to deal with 524288 bits and not 3 bits as in our exercise. This is why the RAID 5 needs a very efficient XOR engine in order to calculate it fast. So when adding one drive for parity you will be able to rebuild the missing data in case of any drive failure.

In our example we have explained RAID 4 where parity is on a dedicated drive. RAID 5 will distribute parities evenly between all drives. Distributed parity provides a slight increase in performance but the XOR magic is the same.

5
4,460 Views

President
CERTIFIED EXPERT
Top Expert 2010

Commented:
Actually, distributed parity can only provide a slight increase in performance in READ operations.  You are GUARANTEED a performance hit in write operations.  Why?? Because it has to not only read (or re-read) data on all the disks, then write to them to save the parity information.

When you write to a RAID5 you have to make sure correct parity info is written to all the disks in the RAID set.  This hurts performance.

Commented:
dlethe, the real issue here is latency and it may even be slightly different or a non issue depending on whether you're running SSD or mechanical storage tech.

We will assume the more common, affordable and sizeable (also more mature technology) which is your typical RAID 5 running mechanical drives.  The issue is that you're writing data to several drives at once.  You are causing the heads to skip to wherever the parity data is to be stored on each drive and to write it every so often.  This is time your drive heads are spending not reading or writing your actual money making data.

Depending how your firmware and driver software are optimized, you can defer those writes for whenever the drive has some downtime, which is near never in environments where RAID 5/10, etc are justified.

Also, anyone ever notice how long it takes to reintegrate a drive, even a very fast drive on a very fast server in a high workload environment?

It goes up into orders of magnitude higher than restoring even multiple mirror arrays.  There are, of course, ways around it, but the issue is always whether you've got more time or money to spend on the problem.  (I've seen RAID 5 arrays in the terabyte (tebibyte now?) order taking a day or two to restore in a a low traffic, high uptime server, and we're talking about a PERC 5 at the time quite new and a PERC 6 right alongside a P400.  Scary stuff to realize you could have a second drive fail in the time it takes to reintegrate and rebuild the first drive.

Open-E, the writeup, however, is an excellent idea and very informative.