首页> 外文期刊>Parallel Computing >Dynamic core affinity for high-performance file upload on Hadoop Distributed File System
【24h】

Dynamic core affinity for high-performance file upload on Hadoop Distributed File System

机译:动态内核亲和力,可在Hadoop分布式文件系统上上传高性能文件

获取原文
获取原文并翻译 | 示例

摘要

The MapReduce programming model, in which the data nodes perform both the data storing and the computation, was introduced for big-data processing. Thus, we need to understand the different resource requirements of data storing and computation tasks and schedule these efficiently over multi-core processors. In particular, the provision of high-performance data storing has become more critical because of the continuously increasing volume of data uploaded to distributed file systems and database servers. However, the analysis of the performance characteristics of the processes that store upstream data is very intricate, because both network and disk inputs/outputs (I/O) are heavily involved in their operations. In this paper, we analyze the impact of core affinity on both network and disk I/O performance and propose a novel approach for dynamic core affinity for high-throughput file upload. We consider the dynamic changes in the processor load and the intensiveness of the file upload at run-time, and accordingly decide the core affinity for service threads, with the objective of maximizing the parallelism, data locality, and resource efficiency. We apply the dynamic core affinity to Hadoop Distributed File System (HDFS). Measurement results show that our implementation can improve the file upload throughput of end applications by more than 30% as compared with the default HDFS, and provide better scalability.
机译:引入了MapReduce编程模型,其中数据节点同时执行数据存储和计算,以进行大数据处理。因此,我们需要了解数据存储和计算任务的不同资源需求,并在多核处理器上有效地调度这些资源。特别是,由于上载到分布式文件系统和数据库服务器的数据量不断增加,因此提供高性能数据存储变得更加关键。但是,对存储上游数据的过程的性能特征的分析非常复杂,因为网络和磁盘输入/输出(I / O)都大量参与其操作。在本文中,我们分析了核心亲和力对网络和磁盘I / O性能的影响,并提出了一种用于高吞吐量文件上传的动态核心亲和力的新方法。我们考虑了处理器负载的动态变化和运行时文件上载的强度,并因此决定了服务线程的核心亲和力,目的是最大化并行性,数据局部性和资源效率。我们将动态核心相似性应用于Hadoop分布式文件系统(HDFS)。测量结果表明,与默认的HDFS相比,我们的实施可以将最终应用程序的文件上传吞吐量提高30%以上,并提供更好的可伸缩性。

著录项

  • 来源
    《Parallel Computing》 |2014年第10期|722-737|共16页
  • 作者单位

    Dept. of Computer Science and Engineering, Konkuk University, Seoul, Republic of Korea;

    Dept. of Computer Science and Engineering, Konkuk University, Seoul, Republic of Korea;

    Center for Experimental Research in Computer Systems, Georgia Institute of Technology, Atlanta, GA, USA;

    Center for Experimental Research in Computer Systems, Georgia Institute of Technology, Atlanta, GA, USA;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Affinity; Big-data; Multi-core; Hadoop Distributed File System; Process scheduling;

    机译:亲和力大数据;多核;Hadoop分布式文件系统;工艺调度;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号