HADRAM

HADRAM: Highly Available Distributed Random Access Memory

Overview

The advent of cluster computing, grid computing, peer-to-peer computing, and other forms of distributed computing where data can be stored on an in principle unlimited number of sites necessitate the development of new types of data structures, in which key-based operations (access, insertion, deletion, modification) perform independently of the number of sites and in which searches are executed in parallel at all sites. This lead to the concept of Scalable Distributed Data Structures [1], [2], [3], [4], which are especially attractive when storing data in the distributed RAM of a multi-computer. Growth over a larger number of sites poses problems not only for query execution times, but also for availability. To achieve scalable high-availability, we need to increase the amount of redundancy with which we store data in dependence on the number of sites currently used. In this context, we can measure availability by the maximum number of sites with which we can avoid data loss for sure. For a example, a one-available SDDS-file will survive one site outage for sure, but may or may not have all data available if two sites become unavailable. While there were SDDS developed with fixed, and often arbitrary levels of availability, LH*RS, a SDDS version of Linear Hashing [5], was the first to offer scalable high availability [6], [7], [8]. LH*RS achieves this scalable high-availability by grouping sites into reliability groups of size m to which it adds a variable number k of parity sites. If all data carrying sites belong to a reliability group with k parity sites, then the SDDS-file is k-available. Records in the data sites have ranks and records with the same rank form a record group. Using Reed-Solomon erasure correcting codes, we generate for each record group k parity records stored in the parity sites in the reliability group. As the LH*RS file scales up, the splitting mechanism of LH* is used to adjust the number of k to the size of the file.

Implementing LH*RS was the topic of the thesis of Rim Moussa at the University of Paris 9, Dauphine, and took her four years of dedicated work. While based on the experience gained, implementation of scalable availability for another type of SDDS such as RP* might proceed faster, the current state of the art individually adds scalable high-availability to each SDDS individually. Our proposal - HADRAM - instead proposes to implement scalable availability as a layer under the SDDS layer. HADRAM provides the following services to the SDDS:
•  High availability at a given level k.
•  Possibility to update the availability level.
•  Recovery of lost data on an SDDS specified spare server.

HADRAM achieves this interface by grouping m about equally sized data buckets (located on different sites) into a reliability group and adding an adjustable number k of parity buckets to each reliability group. The contents of the parity buckets is calculated using a Reed Solomon erasure correcting code, actually the same one used as for the LH*RS implementation.. The concept is thus very similar to LH*RS. The difference is that LH*RS uses record groups and parity records, whereas HADRAM only provides flat storage in the data buckets that it makes available to the SDDS layer.

Contribution

The contribution of the HADRAM project to SDDS lies in:

References:

[1] Steven D. Gribble, Eric A. Brewer, Joseph M. Hellerstein, and David Culler: Scalable, Distributed Data Structures for Internet Service Construction. OSDI 2000.

[2] Litwin, W. Neimat, M-A., Schneider, D. LH* : Linear Hashing for Distributed Files. ACM-SIGMOD Intl. Conf. On Management of Data, 1993.

[3] Litwin, W., Neimat, M-A., Schneider, D. RP* : A Family of Order-Preserving Scalable Distributed Data Structures. 20th Intl. Conf on Very Large Data Bases (VLDB), 1994.

[4] SDDS Bibliography at CERIA, TU Paris (Dauphine), ceria.dauphine.fr/SDDS-bibliograhie.html

[5] Witold Litwin: Linear Hashing : a new tool for file and tables addressing. Reprinted from VLDB-80 in READINGS IN DATABASES. 2-nd ed. Morgan Kaufmann Publishers, Inc., 1994. Stonebraker , M.(Ed.).

[6] Witold Litwin, Thomas Schwarz, S.J.: LH*RS: A High Availability Scalable Distributed Data Structure using Reed Solomon Codes. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, May 16-18, 2000, Dallas, Texas, also SIGMOD Records, vol. 29 (2), June 2000, p. 237-248.

[7] Witold Litwin, Rim Moussa, Thomas Schwarz: LH*RS – A Highly-Available Scalable Distributed Data Structure, Transactions on Database Systems (TODS). Vol. 30(3). September 2005.

[8] Rim Moussa, Witold Litwin, Thomas J. E. Schwarz, S.J.: LH*RS, A highly available distributed data storage system. (Demonstration). Proc. of the 30th VLDB Conference, Toronto, Canada, 2004.

[9] Damian Cieslicki, Stefan Schäckeler, Thomas Schwarz: Highly available distributed RAM (HADRAM). Scalable availability for scalable distributed data structures. Proceedings of the Workshop on Distributed Algorithms and Structures, Santa Clara, January 4 – 5, 2006.

[10] Damian Cieslicki , Stefan Schäckeler, Thomas Schwarz: Efficient Updates in Highly Available Distributed Random Access Memory. The Twelfth International Conference on Parallel and Distributed Systems (ICPADS), 2006. (Copy of submission: pdf)

[11] Witold Litwin, Thomas Schwarz, S.J.: Algebraic Signatures for Scalable Distributed Data Structures. Proceedings of the 20th International Conference on Data Engineering (ICDE), Boston, March 30 - April 2, 2004.

[12] T. Schwarz. Verification of Parity Data in Large Scale Storage Systems. Parallel and Distributed Processing Techniques and Applications, PDPTA'04, Las Vegas, June 21-24, 2004.

[13] Thomas Schwarz and Ethan Miller: Store, Forget, and Check: Using Algebraic Signatures to Check Remotely Administered Storage. International Conference on Distributed Computer Systems, Lisboa, Portugal, July 2006 (ICDCS 2006).

[14] J.J. Metzner: Efficient replicated remote file comparison, IEEE Transactions on Computers. Vol 40(5), 5651-660.

[15] Zan OUyang, Nasir Menon, Torsten Suel, Dimitre Trendafilov: Cluster Based Delta Compression of a Collection of Files, Proc. Web Information Systems Engineering, 2002.

[16] Suel, T. Noel, P. Trendafilov, D.: Improved file synchronization techniques for maintaining large replicated collections over slow networks. Proceedings of the 20th International Conference on Data Engineering (ICDE), Boston, March 30 - April 2, 2004.

Status

Completion expected in 2007.

People

Funding

Publications

©2006 Thomas Schwarz, S.J., COEN, SCU SCU COEN T. Schwarz