Improving MapReduce Performance through Complexity and Performance Based Data Placement in Heterogeneous Hadoop Clusters

机译：通过复杂性和异构Hadoop集群中基于性能的数据放置来提高MapReduce性能

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

MapReduce has emerged as an important programming model with clusters having tens of thousands of nodes. Hadoop, an open source implementation of MapReduce may contain various nodes which are heterogeneous in their computing capacity for various reasons. It is important for the data placement algorithms to partition the input and intermediate data based on the computing capacities of the nodes in the cluster. We propose several enhancements to data placing algorithms in Hadoop such that the load is distributed across the nodes evenly. In this work, we propose two techniques to measure the computing capacities of the nodes. Secondly, we propose improvements to the input data distribution algorithm based on the map and reduce function complexities and the measured heterogeneity of nodes. Finally, we evaluate the improvement of the MapReduce performance.

机译：MapReduce已成为具有数以万计节点的群集的重要编程模型。 Hadoop是MapReduce的开源实现，可能包含各种节点，由于各种原因，这些节点的计算能力不同。对于数据放置算法而言，根据群集中节点的计算能力对输入数据和中间数据进行分区非常重要。我们建议对Hadoop中的数据放置算法进行一些增强，以使负载均匀地分布在各个节点上。在这项工作中，我们提出了两种技术来测量节点的计算能力。其次，我们提出了对基于映射的输入数据分配算法的改进，并减少了功能复杂度和所测得的节点异质性。最后，我们评估MapReduce性能的提高。

著录项

来源
《International conference on distributed computing and internet technologies》|2013年|115-125|共11页
会议地点
作者
Rajashekhar M. Arasanal; Daanish U. Rumani;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Data processing; MapReduce; Heterogeneous cluster; Hadoop;

机译：数据处理; MapReduce;异构簇; Hadoop的;

相似文献

外文文献
中文文献
专利

1. SHadoop: Improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters [J] . Rong Gu, Xiaoliang Yang, Jinshuang Yan, Journal of Parallel and Distributed Computing . 2014,第3期

机译：SHadoop：通过优化Hadoop集群中的作业执行机制来提高MapReduce性能
2. High Performance Computation of Big Data: Performance Optimization Approach towards a Parallel Frequent Item Set Mining Algorithm for Transaction Data based on Hadoop MapReduce Framework [J] . Guru Prasad M S, Nagesh H R, Swathi Prabhu International Journal of Intelligent Systems and Applications . 2017,第1期

机译：大数据的高性能计算：基于Hadoop MapReduce框架的事务数据并行频繁项集挖掘算法的性能优化方法
3. A Study and Performance Comparison of MapReduce and Apache Spark on Twitter Data on Hadoop Cluster [J] . Nowraj Farhan, Ahsan Habib, Arshad Ali International Journal of Information Technology and Computer Science . 2018,第7期

机译：Hadoop集群上Twitter数据上MapReduce和Apache Spark的研究和性能比较
4. Improving MapReduce Performance through Complexity and Performance Based Data Placement in Heterogeneous Hadoop Clusters [C] . Rajashekhar M. Arasanal, Daanish U. Rumani International conference on distributed computing and internet technologies . 2013

机译：通过复杂性和基于性能的数据放置在异构Hadoop集群中提高MapReduce性能
5. Improving Hadoop performance by using metadata of related jobs in text datasets via enhancing MapReduce workflow. [D] . Alshammari, Hamoud. 2016

机译：通过增强MapReduce工作流程，在文本数据集中使用相关作业的元数据来提高Hadoop性能。
6. Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce [O] . Ablimit Aji, Fusheng Wang, Hoang Vo, -1

机译：Hadoop-GIS：基于MapReduce的高性能空间数据仓库系统
7. Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters [O] . Jiong Xie, Shu Yin, Xiaojun Ruan, 2011

机译：通过在异构Hadoop集群中放置数据来提高MapReduce性能

Improving MapReduce Performance through Complexity and Performance Based Data Placement in Heterogeneous Hadoop Clusters

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅