Choosing Optimal Maintenance Time for Stateless Data-Processing Clusters A Case Study of Hadoop Cluster

机译：为无状态数据处理集群选择最佳维护时间—以Hadoop集群为例

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Stateless clusters such as Hadoop clusters are widely deployed to drive the business data analysis. When a cluster needs to be restarted for cluster-wide maintenance, it is desired for the administrators to choose a maintenance window that results in: (1) least disturbance to the cluster operation; and (2) maximized job processing throughput. A straightforward but naive approach is to choose maintenance time that has the least number of running jobs, but such an approach is suboptimal. In this work, we use Hadoop as an use case and propose to determine the optimal cluster maintenance time based on the accumulated job progress, as opposed the number of running jobs. The approach can maximize the job throughput of a stateless cluster by minimizing the amount of lost works due to maintenance. Compared to the straightforward approach, the proposed approach can save up to 50% of wasted cluster resources caused by maintenance according to production cluster traces.

机译：诸如Hadoop集群之类的无状态集群被广泛部署以驱动业务数据分析。当需要重新启动集群以进行集群范围的维护时，管理员需要选择一个维护窗口，该窗口将导致：（1）对集群操作的干扰最小; （2）使作业处理吞吐量最大化。一种简单但幼稚的方法是选择运行作业数量最少的维护时间，但是这种方法不是最佳的。在这项工作中，我们将Hadoop作为用例，并建议根据累积的作业进度（而不是正在运行的作业数）来确定最佳的集群维护时间。通过最小化由于维护而造成的工作损失量，该方法可以最大化无状态群集的工作吞吐量。与直接方法相比，根据生产集群跟踪，所提出的方法最多可以节省50％的维护造成的集群资源浪费。

著录项

来源
《International workshop on job scheduling strategies for parallel processing》|2017年|252-273|共22页
会议地点
作者
Zhenyun Zhuang; Min Shen; Haricharan Ramachandra; Suja Viswesan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. GIVING FUZZINESS TO SPATIAL CLUSTERS: A NEW INDEX FOR CHOOSING THE OPTIMAL NUMBER OF CLUSTERS [J] . GEORGE GREKOUSIS International Journal of Artificial Intelligence Tools: Architectures, Languages, Algorithms . 2013,第3期

机译：为空间集群提供模糊性：选择最佳集群数量的新指标
2. An empirical pipeline for choosing the optimal clustering threshold in RADseq studies [J] . McCartney-Melstad Evan, Gidis Muge, Shaffer H. Bradley Molecular ecology resources . 2019,第5期

机译：选择Radseq研究中选择最佳聚类阈值的经验管道
3. A novel clustering technique for efficient clustering of big data in Hadoop Ecosystem [J] . Sunil Kumar, Maninder Singh Big Data Mining and Analytics . 2019,第4期

机译：一种用于Hadoop生态系统中大数据高效集群的新颖集群技术
4. Choosing Optimal Maintenance Time for Stateless Data-Processing Clusters A Case Study of Hadoop Cluster [C] . Zhenyun Zhuang, Min Shen, Haricharan Ramachandra, International Workshop on Job Scheduling Strategies for Parallel Processing . 2017

机译：为无状态数据处理群集选择最佳维护时间，是Hadoop集群的案例研究
5. Statistical Modeling of Carbon Dioxide and Cluster Analysis of Time Dependent Information: Lag Target Time Series Clustering, Multi-Factor Time Series Clustering, and Multi-Level Time Series Clustering [D] . Kim, Doo Young. 2016

机译：二氧化碳的统计建模和时间相关信息的聚类分析：滞后目标时间序列聚类，多因素时间序列聚类和多级时间序列聚类
6. The Optimally Designed Variational Autoencoder Networks for Clustering and Recovery of Incomplete Multimedia Data [O] . Xiulan Yu, Hongyu Li, Zufan Zhang, 2019

机译：针对不完整多媒体数据的聚类和恢复的优化设计的变分自动编码器网络
7. An empirical pipeline for choosing the optimal clustering threshold in RADseq studies [O] . Evan McCartney‐Melstad, Müge Gidiş, H. Bradley Shaffer 2019

机译：选择Radseq研究中选择最佳聚类阈值的经验管道

Choosing Optimal Maintenance Time for Stateless Data-Processing Clusters A Case Study of Hadoop Cluster

摘要

著录项

相似文献

相关主题

期刊订阅