首页> 外文会议>International Conference on Computer Science and Network Technology >Mitigate I/O Access Pattern Divergence With Heterogeneous Architecture In HDFS
【24h】

Mitigate I/O Access Pattern Divergence With Heterogeneous Architecture In HDFS

机译:在HDFS中减轻与异构架构的I / O访问模式发散

获取原文

摘要

Hadoop has become the de-facto big data processing framework. Since the success of Hadoop in large data set processing, more and more companies and organizations tend to build their analysis logics into Hadoop stack and share across different develop teams for cost-effective. Hadoop now undertakes not only the original batch computation but also lowlatency online queries. However the Hard Disk Drive (HDD) used in Hadoop storage system performs poorly when facing random request and disk contention. The isomorphic HDD storage layer in HDFS encounters the I/O access pattern divergence inevitably. To this end, a promising trend in storage system is to utilize hybrid and heterogeneous devices - Solid State Disks (SSD), which can help achieve very high I/O rates at acceptable cost. However, previous works mostly focus on separated data flow phases thus leading to poor applicability in different application scenes. In this paper, we present a novel heterogeneous architecture which can separate the I/O access into sequential pattern and random pattern. We mitigate the divergence of I/O access through heterogeneous storage system, i.e., HDD serves the sequential I/O request while SSD provides low-latency random file access. We evaluate our system using an actual implementation on a medium-sized cluster consisting of HDDs and different numbers of SSDs with workloads from a leading search engine company. Experiments show that our system outperforms the original system 17% in disk utilization and reduces 12% in job duration time in average.
机译:Hadoop已成为De-Facto大数据处理框架。由于Hadoop的成功在大型数据集处理中,越来越多的公司和组织倾向于将他们的分析逻辑建立到Hadoop堆栈中,并在不同的开发团队中分享以实现成本效益。 Hadoop现在不仅承担了原始批量计算,还承担了LOWLATICY在线查询。然而,在面对随机请求和磁盘争用时,Hadoop存储系统中使用的硬盘驱动器(HDD)执行不佳。 HDFS中的同构HDD存储层不可避免地遇到I / O访问模式发散。为此,存储系统的有希望的趋势是利用混合动力和异构装置 - 固态磁盘(SSD),可以通过可接受的成本实现非常高的I / O率。但是,以前的作品主要专注于分离的数据流阶段,从而导致不同应用场景中的适用性差。在本文中,我们提出了一种新的异构结构,可以将I / O访问分离为顺序模式和随机图案。我们通过异构存储系统缓解I / O访问的分歧,即,HDD服务于顺序I / O请求,而SSD提供低延迟随机文件访问。我们使用实际实现在中等大小集群上使用实际实现来评估我们的系统,包括HDD和不同数量的SSD,其中来自领先的搜索引擎公司的工作负载。实验表明,我们的系统在磁盘利用率中优于原始系统17%,平均持续时间持续时间为12%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号