【24h】

Dynamic Random Access for Hadoop Distributed File System

机译:Hadoop分布式文件系统的动态随机访问

获取原文
获取原文并翻译 | 示例

摘要

Recently, Hadoop Distributed File System (HDFS) has been widely used to manage the large-scale data due to its high scalability. HDFS can natively support sequential queries, which are the most common queries in the applications. However, there still exist many applications that need to apply random queries of large-scale data. So the random queries in large-scale data are becoming more and more important. Unfortunately, the HDFS is not optimized for random reads, hence there are many disadvantages in random access to HDFS. In this paper, we present three methods to solve these issues, which can optimize the random accesses to HDFS and guarantee the sequential access performance at the same time. The methods are as follows: 1) proposing dynamic methods to set the size of data packet in transmission, 2) reusing the TCP connections in localized random accesses, 3) transferring the random accesses to the same server to make full use of the TCP connections. Experimental evaluations based on real world data show that our works are effective and our solutions efficiently support sequential access and random access compared to the original methods.
机译:近年来,Hadoop分布式文件系统(HDFS)具有高度的可扩展性,已被广泛用于管理大规模数据。 HDFS可以本地支持顺序查询,这是应用程序中最常见的查询。但是,仍然存在许多需要对大型数据进行随机查询的应用程序。因此,大规模数据中的随机查询变得越来越重要。不幸的是,HDFS并未针对随机读取进行优化,因此随机访问HDFS有许多缺点。在本文中,我们提出了三种方法来解决这些问题,它们可以优化对HDFS的随机访问并保证同时的顺序访问性能。方法如下:1)提出动态方法来设置传输中数据包的大小; 2)在局部随机访问中重用TCP连接; 3)将随机访问转移到同一服务器以充分利用TCP连接。根据实际数据进行的实验评估表明,与原始方法相比,我们的工作有效,并且我们的解决方案有效地支持了顺序访问和随机访问。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号