首页> 外文期刊>Network, IEEE >Optimizing big data processing performance in the public cloud: opportunities and approaches
【24h】

Optimizing big data processing performance in the public cloud: opportunities and approaches

机译:在公共云中优化大数据处理性能:机遇和方法

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Today???s lightning fast data generation from massive sources is calling for efficient big data processing, which imposes unprecedented demands on the computing and networking infrastructures. State-of-the-art tools, most notably MapReduce, are generally performed on dedicated server clusters to explore data parallelism. For grass roots users or non-computing professionals, the cost of deploying and maintaining a large-scale dedicated server clusters can be prohibitively high, not to mention the technical skills involved. On the other hand, public clouds allow general users to rent virtual machines and run their applications in a pay-as-you-go manner with ultra-high scalability with minimal upfront costs. This new computing paradigm has gained tremendous success in recent years, becoming a highly attractive alternative to dedicated server clusters. This article discusses the critical challenges and opportunities when big data meet the public cloud. We identify the key differences between running big data processing in a public cloud and in dedicated server clusters. We then present two important problems for efficient big data processing in the public cloud, resource provisioning (i.e., how to rent VMs) and VM-MapReduce job/task scheduling (i.e., how to run MapReduce after the VMs are constructed). Each of these two questions have a set of problems to solve. We present solution approaches for certain problems, and offer optimized design guidelines for others. Finally, we discuss our implementation experiences.
机译:如今,从海量数据源中快速生成闪电数据,要求高效的大数据处理,这对计算和网络基础结构提出了前所未有的要求。通常在专用服务器群集上执行最先进的工具,尤其是MapReduce,以探索数据并行性。对于基层用户或非计算专业人员,部署和维护大规模专用服务器群集的成本可能会高得惊人,更不用说所涉及的技术技能了。另一方面,公共云使普通用户可以租用虚拟机并按需付费,以极高的可扩展性以最小的前期成本运行他们的应用程序。近年来,这种新的计算范例取得了巨大的成功,成为专用服务器集群的极具吸引力的替代方案。本文讨论了大数据遇到公共云时的关键挑战和机遇。我们确定了在公共云和专用服务器集群中运行大数据处理之间的主要区别。然后,我们提出了在公共云中进行有效的大数据处理的两个重要问题,即资源调配(即如何租用VM)和VM-MapReduce作业/任务调度(即如何在构建VM之后运行MapReduce)。这两个问题中的每一个都有一组要解决的问题。我们为某些问题提供解决方案,并为其他问题提供优化的设计准则。最后,我们讨论我们的实施经验。

著录项

  • 来源
    《Network, IEEE》 |2015年第5期|31-35|共5页
  • 作者

    Wang Dan; Liu Jiangchuan;

  • 作者单位

    Department of Computing, Hong Kong Polytechnic University;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号