首页> 外文会议>ACM SIGMOD international conference on management of data >ParaTimer: A Progress Indicator for MapReduce DAGs
【24h】

ParaTimer: A Progress Indicator for MapReduce DAGs

机译:PARATIMER:MapReduce DAG的进度指标

获取原文

摘要

Time-oriented progress estimation for parallel queries is a challenging problem that has received only limited attention. In this paper, we present ParaTimer, a new type of time-remaining indicator for parallel queries. Several parallel data processing systems exist. ParaTimer targets environments where declarative queries are translated into ensembles of MapReduce jobs. ParaTimer builds on previous techniques and makes two key contributions. First, it estimates the progress of queries that translate into directed acyclic graphs of MapReduce jobs, where jobs on different paths can execute concurrently (unlike prior work that looked at sequences only). For such queries, we use a new type of critical-path-based progress-estimation approach. Second, ParaTimer handles a variety of real systems challenges such as failures and data skew. To handle unexpected changes in query execution times due to runtime condition changes, ParaTimer provides users with not only one but with a set of time-remaining estimates, each one corresponding to a different carefully selected scenario. We implement our estimator in the Pig system and demonstrate its performance on experiments running on a real, small-scale cluster.
机译:并行查询的时间导向进度估计是只获得了有限的关注具有挑战性的问题。在本文中,我们提出ParaTimer,一种新型的用于并行查询剩余时间指示器。存在若干并行的数据处理系统。 ParaTimer目标声明式查询转换成MapReduce作业的合奏环境。 ParaTimer建立在以前的技术,使两个关键的贡献。首先,它估计,转化为MapReduce工作的向无环图查询的进展,在不同路径上的工作能够并发执行的(不像之前的工作,只能看着序列)。对于这样的疑问,我们使用了一种新型的基于关键路径进度估计方法。其次,ParaTimer处理各种实际系统的挑战,如故障和数据倾斜。为了处理查询执行时间,由于运行状况的变化意想不到的变化,ParaTimer为用户提供了不仅一个而是一组剩余时间的估计,每一个都对应于不同的精心选定的方案。我们实现我们的猪系统估计,并展示其在真实的,小规模的集群上运行的实验性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号