基于MapReduce的Hadoop大表导入编程模型

陈吉荣; 乐嘉锦

首页> 中文期刊>计算机应用 >基于MapReduce的Hadoop大表导入编程模型

基于MapReduce的Hadoop大表导入编程模型

开具论文收录证明 >>

期刊封面封底目录下载 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

针对Sqoop在导入大表时表现出的不稳定和效率较低两个主要问题,设计并实现了一种新的基于MapReduce的大表导入编程模型.该模型对于大表的切分算法是:将大表总的记录数对mapper数求步长,获得对应每个split的SQL查询语句的起始行和区间长度(等于步长),从而保证每个mapper的导入工作量完全相同.该模型的map方式是:进入map函数的键值对中的键是一个split所对应的SQL语句,将查询放在map函数中完成,从而使得模型中的每个mapper只调用一次map函数.对比实验表明:两个记录数相同的大表,无论其记录区间如何分布,其导入时间基本相同,或者对同一表分别用不同的分割字段,导入时间也完全相同；而对于同一个大表,模型的导入效率比Sqoop有显著提高.%To solve the problems of instability and inefficiency when data from a relation database system are transferred into Hadoop Distributed File System (HDFS) using Sqoop,the authors proposed and implemented a new programming model based on MapReduce framework.The algorithm splitting a big table in this model was as follows:firstly a step was calculated by dividing the total lines by the mapper number,then a SQL statement corresponding to each split could be constructed with a start line index and a span range equal to the above step,so this approach could guarantee that each mapper task would issue identical SQL workload.In map phrase,a mapper would only call map function once,with the single key-value pair below:the key was the above SQL statement corresponding to a split,and the value was null.The comparison experiments show that,for two different big tables with the same number of records,the respective importing time was approximately identical regardless of the records distribution,while using two different splitting fields in one big table,the importing time was also the same.At the same time,when applying two different approaches to one big table,the importing efficiency using the model was largely promoted than that using Sqoop.

著录项

来源
《计算机应用》|2013年第9期|2486-2489,2561|共5页
作者
陈吉荣; 乐嘉锦;
展开▼
作者单位

东华大学计算机科学与技术学院,上海 201620;

东华大学计算机科学与技术学院,上海 201620;

展开▼
原文格式 PDF
正文语种 chi
中图分类程序设计;软件工程;
关键词
编程模型; Hadoop; MapReduce; Hadoop分布式文件系统; Sqoop;
入库时间 2022-08-18 04:56:12

相似文献

中文文献
外文文献
专利

1. 基于Hadoop平台的MapReduce模型任务调度算法的研究与改进 [J] . 李霞 ,柯琦 . 数字技术与应用 . 2017,第002期
2. 基于Hadoop MapReduce模型的应用研究 [J] . 朱旭光 . 科学中国人 . 2017,第02Z期
3. 基于Hadoop平台的MapReduce模型任务调度算法的研究与改进 [J] . 李霞1 ,柯琦2 . 数字技术与应用 . 2017,第002期
4. 基于灰盒模型的Hadoop MapReduce job参数性能分析与预测 [J] . 周世龙 ,陈兴蜀 ,罗永刚 . 四川大学学报：工程科学版 . 2014,第S1期
5. 基于Hadoop的贝叶斯过滤MapReduce模型 [J] . 曾青华 ,袁家斌 ,张云洲 . 计算机工程 . 2013,第011期
6. 多核MapReduce编程模型的性能分析 [C] . 伍琦 ,王蕾 ,王永文 . 第十七届计算机工程与工艺年会暨第三届微处理器技术论坛 . 2013
7. 基于融合架构的MapReduce模型与Hadoop加速策略研究 [A] . 陈旭 . 2016

基于MapReduce的Hadoop大表导入编程模型

摘要

著录项

相似文献

相关主题

期刊订阅