Erasure Coding

  • What is it?
    Erasure Coding is a method of data protection for storage systems. Data is broken into fragments, expanded and enriched with redundant data pieces, and then stored across a set of different locations or storage media within a distributed storage system. A subset of this data is enough to regenerate the original data. The goal of erasure coding is to enable data that becomes corrupted at some point in the disk storage process to be reconstructed by using information about the data that's stored elsewhere within the storage system.

    While Erasure Coding is an alternative to replication (simply making multiple copies of data), it can achieve the same or better protection against data loss as replication while reducing the total amount of disk space / raw capacity required by 50% – 80%.

  • Why do you need it?
    Erasure Coding, like RAID (Redundant Array of Independent Disks), is a method that can ensure data availability and safety in a storage system. While RAID is ideal for redundancy of data within a single server (if one disk fails, the data can be recovered from other disks), for a distributed storage system, data should be spread across disks in different servers, so the system can tolerate the loss of an entire server. Therefore, while RAID is a disk-level concept, Erasure Coding is implemented in a way so data isn't restricted to a set of disks in a single server, making it more suitable for distributed storage platforms spread across different storage media or systems, or different physical locations.

    Compared to RAID, Erasure Coding has the ability to reduce the time and overhead required to reconstruct data. However, Erasure Coding comes with some costs, one being that the redundant data segments need to be processed and therefore it can be more CPU-intensive, and that can translate into increased latency. Therefore, Erasure Coding can be ideal to be used for storage platforms where high capacity is important (since less capacity will be used for data replication, more capacity will be available for actual storage use) or cost efficiency is important (less physical capacity will be needed to store and safely protect the data), but where high performance is more important, other data protection methods (such as replication) may be more useful.

  • How is GIGABYTE helpful?
    GIGABYTE and Bigtera have partnered together to offer a range of VirtualStor software defined storage appliances, which offer Erasure Coding as a form of data protection. VirtualStor is available in three different products (Scaler, Converger or Extreme) according to the software environment and storage performance required, and provides multi-terabyte to multi-petabyte storage capacity options.