Choosing Optimal Maintenance Time for Stateless Data-Processing Clusters A Case Study of Hadoop Cluster

机译：为无状态数据处理群集选择最佳维护时间，是Hadoop集群的案例研究

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Stateless clusters such as Hadoop clusters are widely deployed to drive the business data analysis. When a cluster needs to be restarted for cluster-wide maintenance, it is desired for the administrators to choose a maintenance window that results in: (1) least disturbance to the cluster operation; and (2) maximized job processing throughput. A straightforward but naive approach is to choose maintenance time that has the least number of running jobs, but such an approach is suboptimal. In this work, we use Hadoop as an use case and propose to determine the optimal cluster maintenance time based on the accumulated job progress, as opposed the number of running jobs. The approach can maximize the job throughput of a stateless cluster by minimizing the amount of lost works due to maintenance. Compared to the straightforward approach, the proposed approach can save up to 50% of wasted cluster resources caused by maintenance according to production cluster traces.

机译：诸如Hadoop集群等无状态集群被广泛部署以推动业务数据分析。当需要重新开始群集进行群集维护时，管理员需要选择一个维护窗口，导致：（1）对群集操作的最小干扰; （2）最大化的作业处理吞吐量。一个简单但天真的方法是选择具有最少数量的运行作业的维护时间，但这样的方法是次优。在这项工作中，我们将Hadoop用作用例，并建议根据累积的作业进度确定最佳的群集维护时间，而不是运行作业的数量。通过最大限度地减少由于维护导致的丢失工作量，该方法可以最大化无状态群集的作业吞吐量。与直接的方法相比，根据生产群集痕迹，所提出的方法可以节省高达50％的浪费群资源。

著录项

来源
《International Workshop on Job Scheduling Strategies for Parallel Processing》|2017年|278p|共22页
会议地点
作者
Zhenyun Zhuang; Min Shen; Haricharan Ramachandra; Suja Viswesan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP316.4-53;
关键词

相似文献

外文文献
中文文献
专利

1. GIVING FUZZINESS TO SPATIAL CLUSTERS: A NEW INDEX FOR CHOOSING THE OPTIMAL NUMBER OF CLUSTERS [J] . GEORGE GREKOUSIS International Journal of Artificial Intelligence Tools: Architectures, Languages, Algorithms . 2013,第3期

机译：为空间集群提供模糊性：选择最佳集群数量的新指标
2. An empirical pipeline for choosing the optimal clustering threshold in RADseq studies [J] . McCartney-Melstad Evan, Gidis Muge, Shaffer H. Bradley Molecular ecology resources . 2019,第5期

机译：选择Radseq研究中选择最佳聚类阈值的经验管道
3. A novel clustering technique for efficient clustering of big data in Hadoop Ecosystem [J] . Sunil Kumar, Maninder Singh Big Data Mining and Analytics . 2019,第4期

机译：一种用于Hadoop生态系统中大数据高效集群的新颖集群技术
4. Choosing Optimal Maintenance Time for Stateless Data-Processing Clusters A Case Study of Hadoop Cluster [C] . Zhenyun Zhuang, Min Shen, Haricharan Ramachandra, International workshop on job scheduling strategies for parallel processing . 2017

机译：为无状态数据处理集群选择最佳维护时间—以Hadoop集群为例
5. Statistical Modeling of Carbon Dioxide and Cluster Analysis of Time Dependent Information: Lag Target Time Series Clustering, Multi-Factor Time Series Clustering, and Multi-Level Time Series Clustering [D] . Kim, Doo Young. 2016

机译：二氧化碳的统计建模和时间相关信息的聚类分析：滞后目标时间序列聚类，多因素时间序列聚类和多级时间序列聚类
6. The Optimally Designed Variational Autoencoder Networks for Clustering and Recovery of Incomplete Multimedia Data [O] . Xiulan Yu, Hongyu Li, Zufan Zhang, 2019

机译：针对不完整多媒体数据的聚类和恢复的优化设计的变分自动编码器网络
7. An empirical pipeline for choosing the optimal clustering threshold in RADseq studies [O] . Evan McCartney‐Melstad, Müge Gidiş, H. Bradley Shaffer 2019

机译：选择Radseq研究中选择最佳聚类阈值的经验管道

Choosing Optimal Maintenance Time for Stateless Data-Processing Clusters A Case Study of Hadoop Cluster

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅