首页> 外文会议>IEEE/ACM international symposium on cluster, cloud and grid computing >V for Vicissitude: The Challenge of Scaling Complex Big Data Workflows
【24h】

V for Vicissitude: The Challenge of Scaling Complex Big Data Workflows

机译:V代表变迁:扩展复杂大数据工作流的挑战

获取原文

摘要

In this paper we present the scaling of BTWorld, our MapReduce-based approach to observing and analyzing the global BitTorrent network which we have been monitoring for the past 4 years. BTWorld currently provides a comprehensive and complex set of queries implemented in Pig Latin, with data dependencies between them, which translate to several MapReduce jobs that have a heavy-tailed distribution with respect to both execution time and input size characteristics. Processing BitTorrent data in excess of 1 TB with our BTWorld workflow required an in-depth analysis of the entire software stack and the design of a complete optimization cycle. We analyze our system from both theoretical and experimental perspectives and we show how we attained a 15 times larger scale of data processing than our previous results.
机译:在本文中,我们介绍了BTWorld的扩展规模,这是我们基于MapReduce的方法,用于观察和分析我们在过去4年中一直在监视的全球BitTorrent网络。 BTWorld当前提供了一组用Pig Latin实现的全面而复杂的查询,它们之间具有数据依赖关系,这些查询转换为多个MapReduce作业,这些作业在执行时间和输入大小特征方面都有很长的尾巴分布。使用我们的BTWorld工作流程处理超过1 TB的BitTorrent数据需要对整个软件堆栈进行深入分析,并设计一个完整的优化周期。我们从理论和实验的角度分析了我们的系统,并展示了我们如何实现比以前的结果大15倍的数据处理规模。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号