首页> 外文期刊>Scientific programming >Improving I/O Efficiency in Hadoop-Based Massive Data Analysis Programs
【24h】

Improving I/O Efficiency in Hadoop-Based Massive Data Analysis Programs

机译:在基于Hadoop的海量数据分析程序中提高I / O效率

获取原文
       

摘要

Apache Hadoop has been a popular parallel processing tool in the era of big data. While practitioners have rewritten many conventional analysis algorithms to make them customized to Hadoop, the issue of inefficient I/O in Hadoop-based programs has been repeatedly reported in the literature. In this article, we address the problem of the I/O inefficiency in Hadoop-based massive data analysis by introducing our efficient modification of Hadoop. We first incorporate a columnar data layout into the conventional Hadoop framework, without any modification of the Hadoop internals. We also provide Hadoop with indexing capability to save a huge amount of I/O while processing not only selection predicates but also star-join queries that are often used in many analysis tasks.
机译:在大数据时代,Apache Hadoop已经成为流行的并行处理工具。尽管从业者重写了许多常规分析算法以使其针对Hadoop进行了定制,但在基于Hadoop的程序中I / O效率低下的问题已在文献中反复报道。在本文中,我们通过介绍我们对Hadoop的有效修改来解决基于Hadoop的海量数据分析中I / O效率低下的问题。我们首先将列式数据布局合并到常规Hadoop框架中,而无需对Hadoop内部进行任何修改。我们还为Hadoop提供索引功能,以节省大量I / O,同时不仅处理选择谓词,而且还处理许多分析任务中经常使用的星型联接查询。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号