Storage QoS

This project is no longer active. Information is still available below.
Storage systems for large and distributed clusters of compute servers are themselves large and distributed. Their complexity and scale makes it hard to manage these systems, and in particular they make it hard to ensure that applications using them get good, predictable performance. At the same time, shared access to the system from multiple applications, users, and competition from internal system activities leads to a need for predictable performance.

This project investigates mechanisms for improving storage system performance in large distributed storage systems through mechanisms that integrate the performance aspects of the path that I/O operations take through the system, from the application interface on the compute server, through the network, to the storage servers. We focus on five parts of the I/O path in a distributed storage system: I/O scheduling at the storage server, storage server cache management, client-to-server network flow control, client-to-server connection management, and client cache management.

Much of the existing work on QoS for storage considers management of the individual elements of the path that I/Os take through a storage system, but little of the work considers end-to-end management of the whole path. The problem with a naive chaining of multiple management algorithms along the path (e.g., one algorithm for the network, and another for the the storage server) is that emergent behaviors that arise from such chaining can reduce the overall performance of the system. Also, much of the existing work is specific to continuous media and other applications with periodic real-time I/O workloads, as opposed to applications with general workloads.

The unifying idea in this project is that the storage server should control data movement between clients and the server. Only storage server has knowledge of the I/O demands across all its clients. The server is also more likely to contain a bottleneck resource than any individual client is. Accordingly, the server can make I/O scheduling decisions to balance client usage, can manage cache space taking into account the workload from all clients contending for the cache, and can manage the network flow.

The techniques build on our current I/O scheduling work, which allow applications to specify quality of service for I/O sessions. The QoS includes a reserved (minimum) performance, a limit on performance, and fair sharing of extra performance among sessions. The project extends the scheduling work to improve disk utilization, and then uses QoS and utilization information to guide cache management and network flow control decisions. We also investigate how we can use machine learning techniques to predict near-future resource demand in order to handle those clients that are connected over long-latency links.

These techniques should help storage systems to scale to support the compute clusters currently being planned. Large scale means sharing, both within one application and between applications. Performance management ensures that each application or client gets good performance. For example, when many nodes are computing simulation data and other nodes are visualizing that data, the two can proceed without interference. Large scale also means there will always be system maintenance going on to handle failure and replacement. Performance management ensures that maintenance can proceed without interfering with applications.


We are in the process of building a new Linux device driver, Fahrrad, to implement disk I/O scheduling. This driver builds on the RBED CPU scheduler and the RAD real-time model, and on our experience with the Zygaria I/O scheduler driver.


Last modified 23 May 2019