Document Type

Article

Language

eng

Publication Date

11-4-2019

Publisher

Institute of Electrical and Electronic Engineers (IEEE)

Source Publication

2019 35th Symposium on Mass Storage Systems and Technologies (MSST)

Abstract

Systems suffer component failure at sometimes un-predictable rates. Storage systems are no exception; they add redundancy in order to deal with various types of failures. The additional storage constitutes an important capital and operational cost and needs to be dimensioned appropriately. Unfortunately, storage device failure rates are difficult to predict and change over the lifetime of the system. Large disk-based storage centers provide protection against failure at the level of objects. However, this abstraction makes it difficult to adjust to a batch of devices that fail at a higher than anticipated rate. We propose here a solution that uses large pods of storage devices of the same kind, but that can re-organize in response to an increased number of failures of components seen elsewhere in the system or to an anticipated higher failure rate such as infant mortality or end-of-life fragility. Here, I present ways of organizing user data and parity data that allow us to move from three-failure tolerance to two-tolerance and back. A storage system using disk drives that might be suffering from infant mortality can switch from an initially three-failure-tolerant layout to a two-failure-tolerant one when disks have been burnt in. It gains capacity by shedding failure tolerance that have become unnecessary. A storage system using Flash can sacrifice capacity for reliability as its components have undergone many write-erase cycles and thereby become less reliable. Adjustable reliability is easy to achieve using a standard layout based on RAID Level 6 stripes where it is easy to convert components containing user data to ones containing parity data. Here, we present layouts that unlike the RAID layout use only exclusive-or operations, and do not depend on sophisticated, but power-hungry processors. There main advantage is a noticeable increase in reliability over RAID Level 6.

Comments

Accepted version. 2019 35th Symposium on Mass Storage Systems and Technologies (MSST) (November 4, 2019): 217-229. DOI. © 2019 Institute of Electrical and Electronic Engineers (IEEE). Used with permission.

schwarz_13463acc.docx (748 kB)
ADA Accessible Version

Share

COinS