Ultra-Large Scale Storage

We are investigating the construction of large-scale storage systems using object-based storage devices (OSDs). An OSD is a network-attached storage device that presents an interface of arbitrarily-named data objects of variable size rather than sequentially numbered fixed-size blocks, to deal with the data storage details, such as request scheduling and data layout. Metadata is managed separately by one or more specialized metadata servers (MDSs), which is critical to scalability, reliability and security. The separation of data and metadata storage and management provides very high access bandwidth to the large-scale distributed storage systems.


We have developed a prototype implementation of Ceph, a distributed file system based on our research. The metadata server (MDS) is based on Dynamic Subtree Partitioning, an architecture that adaptively distributes metadata across a cluster based on the current workload. Intelligent OSDs manage data replication, failure detection, and data migration during failure recovery or system expansion. Data is stored by each OSD using EBOFS, an object file system based on prior experience with OBFS. Data is distributed using CRUSH, a hash-like distribution function that allows any party to calculate (instead of looking up) the location of data. CRUSH is designed to cope with device failure and cluster expansion, while separating object replicas across failure domains for improved data safety.

The Ceph source code is available at SourceForge.

If the data stored in large-scale storage systems is sensitive or confidential, security measures must be deployed to protect the data. We have designed and implemented Horus, a system that offers fine-grained encryption-based security for large-scale storage. Horus encrypts large datasets using keyed hash trees (KHT) to generate different keys for each region of the dataset, providing fine-grained security. Performance evaluation shows that our prototype’s key distribution is highly scalable and robust.


Last modified 23 May 2019