This dissertation focuses on algorithm design and prototype implementation of fair sharing policies in cloud datacenters with multiple resource types. Specifically, it seeks to address two fundamental resource management problems.;First, how should the the computing resources of a large-scale cluster ---such as CPU cores, memory, and storage --- be fairly shared among running applications? This problem has become even more complicated in cloud datacenters, mainly due to their unprecedented heterogeneity and complexity. Cloud computing clusters are likely to be constructed from a large number of commodity servers spanning multiple generations, with different computing capabilities, bandwidth, and storage capacities. On the other hand, depending on the underlying applications, computing tasks may require vastly different amounts of resources: some are CPU-intensive; some are memory- or bandwidth-bound. Despite these complexities, existing resource sharing policies either result in significant resource fragmentation, or require a homogeneous cluster where servers are of the same specification.;This dissertation proposes Dominant Resource Fairness in Heterogeneous systems (DRFH), a general multi-resource sharing policy that preserves many axiomatically, and highly desirable "fair" properties that used to be provided by weighted fair sharing in the single-resource setting. DRFH eliminates the application's incentive of cheating the cluster scheduler for more allocation share by misreporting its resource requirement, a strategic behaviour commonly observed in production clusters. Prototype implementation and trace-driven simulations show that DRFH can be easily enforced in cluster systems, with higher resource utilization and shorter job completion time than the existing resource sharing policies.;Second, how should the active flows fairly share the network resources of middleboxes, software routers, and other appliances that are widely deployed in datacenters? In middleboxes or software routers, flows usually undergo deep packet inspection, which requires the support of multiple types of resources, and may bottleneck on either CPU, memory bandwidth, or link bandwidth. While there is rich literature of fair queueing for a single type of resource (i.e., link bandwidth), it remains unclear how to schedule multiple resources in middleboxes to achieve fair sharing among flows. A similar problem also arises in virtual machine (VM) scheduling inside a hypervisor, where different VMs may consume different amounts of resources, and it is desirable to fairly multiplex their access to physical resources.;To answer these challenges, this dissertation proposes Multi-resource Round Robin (MR3) that serves flows in rounds and achieves near-perfect fairness in O(1) time. MR3 serves as a foundation for a more general fair scheduler, called Group Multi-Resource Round Robin (GMR3). GMR3 also runs in O(1) time, yet provides weight-proportional packet latency when flows are assigned uneven weights.;This dissertation also identifies a new challenge that is unique to multi-resource scheduling: the general tradeoff between fairness and efficiency. Such a tradeoff has never been a problem for traditional fair sharing of link bandwidth. As long as the queueing algorithm is work conserving, the bandwidth is always fully utilized, and fairness is the only concern. However, in the presence of multiple resource types, fairness and efficiency are conflicting objectives that cannot be achieved at the same time. Motivated by this problem, a new queueing algorithm is proposed and prototyped. It allows network operator to flexibly specify her tradeoff preference and implements the specified tradeoff by determining the right packet scheduling order.
展开▼