Improving I/O Efficiency in Hadoop-Based Massive Data Analysis Programs

Kyong-Ha Lee; Woo Lam Kang; Young-Kyoon Suh

首页> 外文期刊>Scientific programming >Improving I/O Efficiency in Hadoop-Based Massive Data Analysis Programs

【24h】

Improving I/O Efficiency in Hadoop-Based Massive Data Analysis Programs

机译：在基于Hadoop的海量数据分析程序中提高I / O效率

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Apache Hadoop has been a popular parallel processing tool in the era of big data. While practitioners have rewritten many conventional analysis algorithms to make them customized to Hadoop, the issue of inefficient I/O in Hadoop-based programs has been repeatedly reported in the literature. In this article, we address the problem of the I/O inefficiency in Hadoop-based massive data analysis by introducing our efficient modification of Hadoop. We first incorporate a columnar data layout into the conventional Hadoop framework, without any modification of the Hadoop internals. We also provide Hadoop with indexing capability to save a huge amount of I/O while processing not only selection predicates but also star-join queries that are often used in many analysis tasks.

机译：在大数据时代，Apache Hadoop已经成为流行的并行处理工具。尽管从业者重写了许多常规分析算法以使其针对Hadoop进行了定制，但在基于Hadoop的程序中I / O效率低下的问题已在文献中反复报道。在本文中，我们通过介绍我们对Hadoop的有效修改来解决基于Hadoop的海量数据分析中I / O效率低下的问题。我们首先将列式数据布局合并到常规Hadoop框架中，而无需对Hadoop内部进行任何修改。我们还为Hadoop提供索引功能，以节省大量I / O，同时不仅处理选择谓词，而且还处理许多分析任务中经常使用的星型联接查询。

著录项

来源
《Scientific programming》 |2018年第1期|共页
作者
Kyong-Ha Lee; Woo Lam Kang; Young-Kyoon Suh;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Improving I/O Efficiency in Hadoop-Based Massive Data Analysis Programs [J] . Lee Kyong-Ha, Kang Woo Lam, Suh Young-Kyoon Scientific programming . 2018,第PTa2期

机译：在基于Hadoop的海量数据分析程序中提高I / O效率
2. Data Analysis Tool Developed to Help Cities Improve Efficiency of Recycling Programs [J] . Anthony Adragna Environment reporter . 2012,第33期

机译：开发数据分析工具以帮助城市提高回收计划的效率
3. Software-Defined Networking for Scalable Cloud-based Services to Improve System Performance of Hadoop-based Big Data Applications [J] . Desta Haileselassie Hagos International journal of grid and high performance computing . 2016,第2期

机译：用于可扩展的基于云的服务的软件定义网络，以提高基于Hadoop的大数据应用程序的系统性能
4. Design of Hadoop-based Massive Intelligence Data Management System [C] . ZHAO Guolin, SHI Ziyan, HU Qiaolin International Conference on Computer-Aided Design, Manufacturing, Modeling and Simulation . 2014

机译：基于Hadoop的大量智能数据管理系统设计
5. Can public sector reforms improve the efficiency of public water utilities? An empirical analysis of the water sector in Mexico using data envelopment analysis. [D] . Anwandter, Lars. 2000

机译：公共部门改革可以提高公共水务公司的效率吗？使用数据包络分析对墨西哥水行业进行的经验分析。
6. Use of Diabetes Data Management Software Reports by Health Care Providers Patients With Diabetes and Caregivers Improves Accuracy and Efficiency of Data Analysis and Interpretation Compared With Traditional Logbook Data [O] . Deborah A. Hinnen, Ann Buskirk, Maureen Lyden, 2015

机译：与传统的日志数据相比医疗保健提供者糖尿病患者和护理人员使用糖尿病数据管理软件报告可提高数据分析和解释的准确性和效率
7. Improving I/O Efficiency in Hadoop-Based Massive Data Analysis Programs [O] . Kyong-Ha Lee, Woo Lam Kang, Young-Kyoon Suh 2018

机译：基于Hadoop的大规模数据分析计划的I / O效率提高I / O效率

Improving I/O Efficiency in Hadoop-Based Massive Data Analysis Programs

摘要

著录项

相似文献

相关主题

期刊订阅