【24h】

Improving HDFS write performance using efficient replica placement

机译:通过有效的副本放置来提高HDFS写入性能

获取原文
获取原文并翻译 | 示例

摘要

In last half decade, there is a tremendous growth in the network applications; we are experiencing an information explosion era and for that large amount of distributed data being stored and managed. Distributed file system is designed to handle these types of data. Major design issues in DFS are scalability, fault tolerance, flexibility and availability. The most prevalent DFS to deal with these challenges is the Hadoop Distributed File System (HDFS) which is a variant of the Google File System (GFS). Apache Hadoop able to solve current issues of Big Data by simplifying the implementation of data intensive and exceptionally parallel distributed applications. HDFS handles fault tolerance using Data Replication. HDFS replicates each data block on different datanode for reliability and availability. The existing implementation of HDFS in Hadoop performs replication in a pipelined manner which takes much time for replication. Here proposed system is an alternative parallel approach for efficient replica placement in HDFS to improve throughput. The experimentation has been performed to compare its performance with existing pipelined replication approach, which improve HDFS write throughput up to 10% testified by the TestDFSIO benchmark. This paper also depicts the analysis on the basis of different HDFS configuration parameter like file block size and replication factor which affects HDFS write performance in both approaches.
机译:在过去的五年中,网络应用有了巨大的增长。我们正经历着信息爆炸时代,正在存储和管理大量分布式数据。分布式文件系统旨在处理这些类型的数据。 DFS中的主要设计问题是可伸缩性,容错性,灵活性和可用性。应对这些挑战的最流行的DFS是Hadoop分布式文件系统(HDFS),它是Google文件系统(GFS)的变体。 Apache Hadoop通过简化数据密集型和并行并行分布式应用程序的实施,能够解决当前的大数据问题。 HDFS使用数据复制处理容错功能。 HDFS在不同的数据节点上复制每个数据块,以提高可靠性和可用性。 Hadoop中HDFS的现有实现以流水线方式执行复制,这需要大量时间进行复制。这里提出的系统是在HDFS中有效放置副本以提高吞吐量的替代并行方法。已经进行了实验,将其性能与现有的流水线复制方法进行了比较,该方法将HDFS写入吞吐量提高了10%(由TestDFSIO基准测试证明)。本文还描述了基于不同的HDFS配置参数(例如文件块大小和复制因子)的分析,这两种方法都会影响HDFS的写入性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号