Understanding RAID levels

Why is RAID so difficult to understand? Probably because the algorithms are pretty complex and there are so many levels to RAID. Possibly because it sounds complicated, most likely because it’s too easy to get lost in the details and not understand the basic logic.

Today I’m going to explain the basic logic used in the most popular forms of RAID. This is not a perfect mathematical representation but should help the technical understand an overview of what is happening within the technology.

This is not designed as a technical specification, it’s not designed as a guide on how to build your own RAID. This is simply the fuzzy maths that make RAID more understandable to the project manager, or network administrator who isn’t directly involved in storage but has a technical background.

XOR

XOR is something we have to briefly touch here, the basic principle is that if two things are identical then it produces a 0 whereas if they are different it produces a 1.

1 and 1 is 0, 1 and 0 is 1, 0 and 1 is 1, 0 and 0 is 0. That’s all you need to know right now.

RAID 0

RAID 0 is simple, it’s multiple disks with information spread over them. The more disks you have the more information you can store. If one disk dies you lose your RAID array.

Drive A	Drive B
1	0
1	1
1	0
0	0
0	1

Space = number of drives * drive space

Two one terabyte drives equate to two terabytes of usable storage; 2TB = 2 * 1TB.

RAID 1

When we get to RAID 1 things become more based on redundancy instead of storage space. You have your data directly duplicated on two separate locations. Every write happens twice and every read happens once, you can see an increase in performance as data can be read from two places simultaneously.

Drive A	Drive B
1	1
1	1
0	0
0	0
1	1

Space = (number of drives * drive space) / 2

Two one terabyte drives equate to one terabytes of usable storage; 1TB = (2 * 1TB)/2.

RAID 5

RAID 5 is a way to ensure data safety without losing out on all your disk space. The drives have distributed parity which basically means that each drive carries a bit of information to restore the other drives. In a three drive array the each drive carries two thirds data and one third information to restore the other drives; effectively meaning that if you lose a drive out of a three drive array there is enough information to recover the array from the data held on the other drives.

So, how does parity work? If 1 XOR 0 is 1 and you lose the 0 you know that it takes a 0 and a 1 XOR’d to get 1. This is how you can recover a whole drive from the parts remaining in the array. Simple fuzzy maths that explains the concept for you.

In the example the parity sum is shown and the parity is made bold to make it easier to identify. You can see how the equivalent of one drive is used on parity in each RAID.

Drive A	Drive B	Drive C	Parity Sum
1	1	0	1 XOR 1 = 0
1	1	0	1 XOR 0 = 1
0	1	1	1 XOR 1 = 0
0	1	1	0 XOR 1 = 1
0	1	1	0 XOR 1 = 1
0	0	0	0 XOR 0 = 0

Space = (number of drives – 1) * drive space

Three one terabyte drives equate to two terabytes of usable storage; 2TB = (3 – 1) * 1TB.

RAID 6

RAID 6 expands on RAID 5 to include a second layer of parity. It provides the ability to handle two drives failing in an array.

No need to show the parity sum here, it’s the same as in the example above but with more drives. Parity bits are in bold again. You can see how the equivalent of two drive is used on parity in each RAID.

Drive A	Drive B	Drive C	Drive D
1	0	1	0
1	1	1	0
1	0	1	0
1	0	1	1
0	1	1	1
0	0	0	0

Space = (number of drives – 2) * drive space

Four one terabyte drives equate to two terabytes of usable storage; 2TB = (3 – 2) * 1TB.

Once again, this is not meant to be a full guide to RAID technologies, just a basic understanding of the mathematics behind them.

Regards,
Robert Small.

1 Response to Understanding RAID levels

Alec Weder says:

Jan 7, 2015 at 10:41 am

The biggest issue with RAID are the unrecoverable read errors.
If you loose the drive, the RAID has to read 100% of the remaining drives even if there is no data on portions of the drive. If you get an error on rebuild, the entire array will die.

http://www.enterprisestorageforum.com/storage-management/making-raid-work-into-the-future-1.html

A UER on SATA of 1 in 10^14 bits read means a read failure every 12.5 terabytes. A 500
GB drive has 0.04E14 bits, so in the worst case rebuilding that drive in a five-drive
RAID-5 group means transferring 0.20E14 bits. This means there is a 20% probability
of an unrecoverable error during the rebuild. Enterprise class disks are less prone to this problem:

http://www.lucidti.com/zfs-checksums-add-reliability-to-nas-storage