首页> 外文会议>International conference on very large data bases >Hadoop's Adolescence: An analysis of Hadoop usage in scientific workloads
【24h】

Hadoop's Adolescence: An analysis of Hadoop usage in scientific workloads

机译:Hadoop的青春期:分析科学工作负载中的Hadoop使用情况

获取原文
获取外文期刊封面目录资料

摘要

We analyze Hadoop workloads from three different research clusters from a user-centric perspective. The goal is to better understand data scientists' use of the system and how well the use of the system matches its design. Our analysis suggests that Hadoop usage is still in its adolescence. We see underuse of Hadoop features, extensions, and tools. We see significant diversity in resource usage and application styles, including some interactive and iterative workloads, motivating new tools in the ecosystem. We also observe significant opportunities for optimizations of these workloads. We find that job customization and configuration are used in a narrow scope, suggesting the future pursuit of automatic tuning systems. Overall, we present the first user-centered measurement study of Hadoop and find significant opportunities for improving its efficient use for data scientists.
机译:我们从以用户为中心的角度分析了来自三个不同研究集群的Hadoop工作负载。目的是更好地了解数据科学家对该系统的使用以及该系统的使用与其设计相匹配的程度。我们的分析表明,Hadoop的使用仍处于青春期。我们看到Hadoop功能,扩展和工具的使用不足。我们看到资源使用和应用程序样式(包括一些交互和迭代的工作负载)存在巨大差异,从而激发了生态系统中的新工具。我们还发现优化这些工作负载的巨大机会。我们发现作业自定义和配置在狭窄的范围内使用,这表明了对自动调整系统的未来追求。总体而言,我们提出了第一个以用户为中心的Hadoop测量研究,并发现了改善数据科学家有效利用的重大机会。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号