首页> 外文会议>International conference on very large data bases >Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads
【24h】

Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads

机译:大数据系统中的交互式分析处理:MapReduce工作量的跨行业研究

获取原文
获取外文期刊封面目录资料

摘要

Within the past few years, organizations in diverse industries have adopted MapReduce-based systems for large-scale data processing. Along with these new users, important new workloads have emerged which feature many small, short, and increasingly interactive jobs in addition to the large, long-running batch jobs for which MapReduce was originally designed. As interactive, large-scale query processing is a strength of the RDBMS community, it is important that lessons from that field be carried over and applied where possible in this new domain. However, these new workloads have not yet been described in the literature. We fill this gap with an empirical analysis of MapReduce traces from six separate business-critical deployments inside Facebook and at Cloudera customers in e-commerce, telecommunications, media, and retail. Our key contribution is a characterization of new MapReduce workloads which are driven in part by interactive analysis, and which make heavy use of query-like programming frameworks on top of MapReduce. These workloads display diverse behaviors which invalidate prior assumptions about MapReduce such as uniform data access, regular diurnal patterns, and prevalence of large jobs. A secondary contribution is a first step towards creating a TPC-like data processing benchmark for MapReduce.
机译:在过去的几年中,各行各业的组织已采用基于MapReduce的系统进行大规模数据处理。除了这些新用户以外,还出现了重要的新工作负载,这些工作负载除了最初设计MapReduce的大型,长期运行的批处理作业外,还具有许多小型,短期和日益交互的作业。由于交互式大规模查询处理是RDBMS社区的强项,因此重要的是,应尽可能在新领域中继承和应用该领域的经验教训。但是,这些新的工作负载尚未在文献中进行描述。我们通过对在Facebook内部以及在电子商务,电信,媒体和零售领域的Cloudera客户的六个独立的业务关键部署进行MapReduce跟踪的经验分析来填补这一空白。我们的主要贡献是表征新的MapReduce工作负载,这些工作负载部分是由交互式分析驱动的,并且在MapReduce之上大量使用了类似于查询的编程框架。这些工作负载显示出各种行为,这些行为使先前关于MapReduce的假设无效,例如统一的数据访问,规则的昼夜模式和大工作的普遍性。第二个贡献是为MapReduce创建类似于TPC的数据处理基准测试的第一步。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号