We're an ISO27001:2013 Certified Supplier

blog-post-featured-image

What is RAID?

Linux RAID (“Redundant Array of Independent/Inexpensive Disks”) is a way to combine two or more disks to gain extra feature(s). The most common reasons for using RAID are to increase:

  • total storage capacity
  • reliability
  • speed of access

It’s even possible to do all three. Sound too good to be true? There are some caveats…

Horses for Courses

Different RAID configurations have different properties, and the various configurations are referred to as ‘RAID levels’, each RAID level having a number. There are many different RAID levels; here, we’ll concern ourselves with only the more popular ones. In the sections below, we’ll talk about combining a number of 1TB (one Terabyte, a thousand Gigabytes) disks and compare the results.

RAID 0

Multiple disks are combined to make one large disk. Two 1TB disks in a RAID 0 configuration gives a total storage capacity of 2TB. Key points:

  • capacity: the sum of the component disks, so no lost space
  • reliability: if any disk fails, all data is lost. A two-disk RAID-0 array is thus half as reliable as a single disk
  • speed: fast data access

Typical application: where fast access to ephemeral data is needed (ie, data you don’t need to retain). This may be a temporary working area for data that can easily be recreated. Generally speaking, RAID 0 is seldom used these days.

RAID 1

Usually two disks, although it may be more, configured so that each disk has a full copy of all of the data. For this reason, this is sometimes called “disk mirroring”. Two 1TB disks in a RAID-1 configuration gives a total storage capacity of 1TB. Key points:

  • capacity: half the disk space is lost (for a two-volume set)
  • reliability: no data loss if one drive fails
  • speed: fast reads, reasonably fast writes

Typical application: when data reliability is needed, but total storage requirements are modest. Small servers typically have two drives in a RAID-1 configuration for all of the system and the user data.

RAID 5

A minimum of three disks, although often more are used, with the data spread over them in such a way that the system can tolerate any one disk failing without data loss. Three 1TB disks in a RAID-5 configuration gives a total storage capacity of 2TB. Key points:

  • capacity: the storage capacity of one disk is lost
  • reliability: no data loss if one drive fails
  • speed: fast reads, writes can be slower

Writing to a RAID-5 array is more complex than a single disk or the simpler RAID levels, and for this reason (amongst others) sometimes a dedicated “RAID controller” card is used, which removes the processing overhead of RAID-5 from the system CPU.

Typical application: modest data stores.

RAID 6

Similar to RAID-5, but can tolerate the loss of two disks. A minimum of four disks are required; four 1TB disks in a RAID-6 configuration would give a total storage capacity of 2TB.

  • capacity: the storage capacity of two disks is lost
  • reliability: no data loss if two drives fail
  • speed: fast reads, writes can be slower

Writing to a RAID-6 array has the same considerations as RAID-5.

Typical application: mid-size data stores.

RAID 10, 50 and 60

These are RAID levels built by combining disks in multiple RAID configurations. For example, RAID-60 is two RAID-6 arrays which are then combined into a RAID-0 array. These RAID levels are typically used for larger data stores.

Disk Failure

The more disks you have, the more often you will experience disk failure.

Let’s assume that a disk fails, on average, after five years. If you have an array of five disks, you can expect, on average, one disk failure per year.

However, if your five disks were bought at the same time from one supplier, there is a reasonable chance that they are all from the same batch – and then the life of each disk is more likely to be very similar. So, after three years, one disk fails; by year four, all five disks may have been replaced.

Why does that matter? Because rebuilding a RAID array takes some time, and the time it takes is approximately proportional to the size of the component disks – it takes longer to rebuild a 4TB disk in an array than a 1TB disk. Disk capacity is growing much faster than disk speed, so as technology advances, RAID array rebuilds are taking longer.  Furthermore, rebuilding a RAID array puts the remaining disks under a high workload, stressing them and increasing the likelihood of another disk failure. In a RAID 5 array, if one of the remaining disks fails during that time, all data is lost.

It is for these reasons that more care must be taken when designing a large data store, and RAID 50 or (ideally) 60 is to be preferred for such environments.

Ignorance may not be bliss

The only way that the redundant disks can help save your data is if a failed disk is immediately replaced. The whole point of disk redundancy is that when a disk fails, the system carries on working as before. Without effective monitoring of your disk arrays, you won’t know that a disk has failed.

If you take nothing else from this page, take this: using RAID techniques to mitigate against hardware failure is pointless unless you monitor your RAID arrays.

You will know sooner or later, of course. If you have RAID-1, disk mirroring, in place, and you have no monitoring to detect the failure of one disk, you’ll certainly be aware when the second one fails. And yes, people really do do this.

Exactly how to monitor the RAID depends upon how it is implemented. It’s possible to manage RAID from Linux itself using the mdadm utility; that utility also includes a level of RAID monitoring and notification. Hardware RAID controllers will need to use other utilities, depending upon the manufacturer of the controller.

Finally

RAID technology is a great way of improving disk performance, capacity and reliability – but to do so effectively requires a little planning and forethought. Oh, and did I mention monitoring?