首页> 外文学位 >Resource Scheduling in Geo-Distributed Computing
【24h】

Resource Scheduling in Geo-Distributed Computing

机译:地理分布计算中的资源调度

获取原文
获取原文并翻译 | 示例

摘要

Due to the growing needs in computing and the increasing volume of data, cloud service providers deploy multiple datacenters around the world in order to provide fast computing response. Many applications utilizing such geo-distributed deployment include web search, user behavior analysis, machine learning applications, and live camera feeds processing. Depending on the characteristics of the applications, their data may be generated, stored, and processed across the geo-distributed sites. Hence, efficient processing of the data across the geo-distributed sites is critical to the applications' performance. Existing solutions first aggregate all the required data at one location and execute the computation within the site. Such solutions incur large amounts of data transfer across the WAN and lead to prolonged response times for the applications due to significant network delays. An emerging trend is to instead distribute the computation across the sites based on data distribution, and aggregate only the results afterwards. Recent works have shown that such an approach can result in significant improvement in response time as well as reduction in WAN bandwidth usage. However, the performance of the geo-distributed jobs highly depends on how the resources are scheduled, which raises new challenges as the trivial extensions of state-of-the-art scheduling solutions lead to sub-optimal performance. In this thesis, we first improve the performance of geo-distributed jobs from the perspective of computation resources. We provide the insights into how conventional Shortest Remaining Processing Time (SRPT) falls short due to the lack of scheduling coordination among the sites, and propose a light-weight heuristic that significantly improves the jobs' response time. We also design a new job scheduling heuristic that coordinates the workload demands and the resource availability among the sites and greedily schedules job that can finish quickly. The trace-driven simulation studies show that our proposed scheduling heuristics effectively reduces the response time of the geo-distributed jobs by up to 50%. Next, we address the geo-distributed jobs' performance from the perspectives of both the computation and the network resources. Specifically, we address the scheduling challenge of the heterogeneity of the resources availability across the sites and the mismatch of the data distribution across the geo-distributed sites. We formulate the task placement decisions using a Linear Programming optimization model, and allocate the resources greedily to the job that can finish quickly. In addition to the response time, our design can also easily incorporate other performance goals, e.g., fairness and WAN usage, with simple control knobs. The EC2-based deployment of our prototype and the large-scale trace-driven simulations showed that our solutions can improve the response time of a baseline in-place scheduling approach by up to 77%, and improve the state-of-the-art geo-distributed analytics solution by up to 55%. Finally, we expand to a more general setting in which each job has multiple configuration options, and its quality depends on the configuration it utilizes. We motivate this problem by the scenario of processing live camera feeds across hierarchical clusters. In this setting we focus on the scheduling problem of jointly determining job configuration and placement for concurrent jobs and design an efficient heuristic to maximize the overall quality with available resources across the geo-distributed sites. Our evaluation based on an Azure deployment of our prototype showed that the proposed solution outperforms the state-of-the-art video analytics scheduler by up to 2.3X and the widely deployed Fair Scheduler by up to 15.7X, in terms of the average quality of the concurrent jobs.
机译:由于计算需求的增长和数据量的增长,云服务提供商在全球部署了多个数据中心,以提供快速的计算响应。利用此类地理分布部署的许多应用程序包括Web搜索,用户行为分析,机器学习应用程序和实时摄像机供稿处理。根据应用程序的特性,可以跨地理分布的站点生成,存储和处理其数据。因此,跨地理分布站点的数据的有效处理对于应用程序的性能至关重要。现有解决方案首先将所有所需的数据汇总到一个位置,然后在站点内执行计算。由于大量的网络延迟,此类解决方案会导致在WAN上传输大量数据,并导致应用程序的响应时间延长。一种新兴趋势是取而代之的是根据数据分布在站点之间分布计算,然后仅汇总结果。最近的工作表明,这种方法可以显着改善响应时间,并减少WAN带宽的使用。但是,地理分布作业的性能高度取决于资源的调度方式,这带来了新的挑战,因为最先进的调度解决方案的琐碎扩展导致了次优的性能。本文首先从计算资源的角度来提高地理分布作业的性能。我们提供了有关传统的最短剩余处理时间(SRPT)如何由于站点之间缺乏计划协调而不足的见解,并提出了一种轻量级的启发式方法,可以显着改善作业的响应时间。我们还设计了一种新的作业调度试探法,用于协调站点之间的工作负载需求和资源可用性,并贪婪地调度可以快速完成的作业。跟踪驱动的仿真研究表明,我们提出的调度启发式方法可将地理分布作业的响应时间有效减少多达50%。接下来,我们从计算和网络资源的角度来讨论地理分布作业的性能。具体来说,我们解决了站点之间资源可用性的异构性和地理分布站点之间的数据分布不匹配的调度难题。我们使用线性规划优化模型来制定任务放置决策,并将资源贪婪地分配给可以快速完成的工作。除了响应时间外,我们的设计还可以通过简单的控制旋钮轻松整合其他性能目标,例如公平性和WAN使用。我们的原型基于EC2的部署以及大规模跟踪驱动的模拟表明,我们的解决方案可以将基线就地调度方法的响应时间缩短多达77%,并可以改进最新技术地理分布分析解决方案最多可提高55%。最后,我们扩展到一个更通用的设置,其中每个作业都有多个配置选项,其质量取决于所使用的配置。我们通过处理跨层次集群的实时摄像机供稿的场景来激发此问题。在这种情况下,我们关注的调度问题是共同确定并发作业的作业配置和位置,并设计一种有效的启发式方法,以利用地理分布站点上的可用资源来最大化整体质量。我们根据原型的Azure部署进行的评估表明,就平均质量而言,所建议的解决方案的性能比最先进的视频分析计划程序要高2.3倍,而广泛部署的Fair Scheduler则要高15.7倍。并发作业。

著录项

  • 作者

    Hung, Chien-Chun.;

  • 作者单位

    University of Southern California.;

  • 授予单位 University of Southern California.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2017
  • 页码 173 p.
  • 总页数 173
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号