首页> 外文期刊>The Journal of Systems and Software >Data prefetching and file synchronizing for performance optimization in Hadoop-based hybrid cloud
【24h】

Data prefetching and file synchronizing for performance optimization in Hadoop-based hybrid cloud

机译:数据预取和文件同步以优化基于Hadoop的混合云中的性能

获取原文
获取原文并翻译 | 示例

摘要

Driven by the technical factors such as system reliability, bandwidth constraints, data confidentiality and security, as well as the economic factors such as initial capital expenditure and re-occurring operating expenditure, today's cloud computing tends to adopt hybrid cloud model. However, because hybrid clouds scale both numerically and geographically, the network delay becomes the main constraint in remote file system access. To hide network latency and reduce job completion time in Hadoop-based hybrid cloud data access, a scheduling-aware data prefetching scheme to enhance non-local map task's data locality in Hadoop-based centralized hybrid cloud (CHCDLOS-Prefetch) and a file synchronizing method to decrease job execution delay in Hadoop-based distributed hybrid cloud (DHCDLO-Sync) are proposed. In the former, input data for non-local map tasks are fetched ahead of time to target compute nodes by making use of idle network bandwidth. In the latter, considered from job level scheduling, data files with high popularity are proactively synchronized beforehand among sub-clouds to strength intra sub-cloud data locality in distributed hybrid cloud. Extensive experimental results illustrate that compared to the Capacity, the Fair and the DARE algorithms, our proposed algorithms improve hybrid cloud performance more significantly in data locality and job completion time. (C) 2019 Elsevier Inc. All rights reserved.
机译:在诸如系统可靠性,带宽限制,数据机密性和安全性等技术因素以及诸如初始资本支出和经常性运营支出等经济因素的驱动下,当今的云计算倾向于采用混合云模型。但是,由于混合云在数字和地理上都可以扩展,因此网络延迟成为远程文件系统访问的主要限制。为了在基于Hadoop的混合云数据访问中隐藏网络延迟并减少作业完成时间,一种调度感知的数据预取方案可增强基于Hadoop的集中式混合云(CHCDLOS-Prefetch)中非本地地图任务的数据本地性,并实现文件同步提出了一种减少基于Hadoop的分布式混合云(DHCDLO-Sync)中作业执行延迟的方法。在前者中,非本地映射任务的输入数据通过利用空闲网络带宽提前获取到目标计算节点。在后者中,从作业级别调度考虑,具有较高知名度的数据文件会在子云之间预先主动进行同步,以增强分布式混合云中子云内部的数据局部性。大量的实验结果表明,与“容量”,“公平”和“ DARE”算法相比,我们提出的算法在数据局部性和作业完成时间方面显着提高了混合云性能。 (C)2019 Elsevier Inc.保留所有权利。

著录项

  • 来源
    《The Journal of Systems and Software》 |2019年第5期|133-149|共17页
  • 作者单位

    Wuhan Univ Technol, Sch Comp Sci & Technol, Wuhan 430063, Hubei, Peoples R China|Beijing Technol & Business Univ, Beijing Key Lab Big Data Technol Food Safety, Beijing, Peoples R China|Xian Univ Posts & Telecommun, Shaanxi Key Lab Network Data Anal & Intelligent P, Xian 710121, Shaanxi, Peoples R China;

    Wuhan Univ Technol, Sch Comp Sci & Technol, Wuhan 430063, Hubei, Peoples R China|Huanghuai Univ, Int Coll, Zhumadian 463000, Peoples R China;

    Beijing Technol & Business Univ, Beijing Key Lab Big Data Technol Food Safety, Beijing, Peoples R China;

    Wuhan Univ Technol, Sch Comp Sci & Technol, Wuhan 430063, Hubei, Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Data prefetching; File synchronizing; Hybrid cloud;

    机译:数据预取;文件同步;混合云;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号