...
首页> 外文期刊>Distributed and Parallel Databases >Locality-aware allocation of multi-dimensional correlated files on the cloud platform
【24h】

Locality-aware allocation of multi-dimensional correlated files on the cloud platform

机译:云端平台上多维关联文件的位置感知分配

获取原文
获取原文并翻译 | 示例
           

摘要

The effective management of enormous data volumes on the Cloud platform has attracted devoting research efforts. In this paper, we study the problem of allocating files with multidimensional correlations on the Cloud platform, such that files can be retrieved and processed more efficiently. Currently, most prevailing Cloud file systems allocate data following the principles of fault tolerance and availability, while inter-file correlations, i.e. files correlated with each other, are often neglected. As a matter of fact, data files are commonly correlated in various ways in real practices. And correlated files are most likely to be involved in the same computation process. Therefore, it raises a new challenge of allocating files with multi-dimensional correlations with the “subspace locality” taken into consideration to improve the system throughput. We propose two allocation methods for multi-dimensional correlated files stored on the Cloud platform, such that the I/O efficiency and data access locality are improved in the MapReduce processing paradigm, without hurting the fault tolerance and availability properties of the underlying file systems. Different from the techniques proposed in [1,2], which quickly map the locations of desired data for a given query ({mathcal {Q}}), we focus on improving the system throughput for batch jobs over correlated data files. We clearly formulate the problem and study a series of solutions on HDFS [9]. Evaluations with real application scenarios prove the effectiveness of our proposals: significant I/O and network costs can be saved during the data retrieval and processing. Especially for batch OLAP jobs, our solution demonstrates well balanced workload among distributed computing nodes.
机译:在云平台上对大量数据的有效管理吸引了专门的研究工作。在本文中,我们研究了在Cloud平台上分配具有多维相关性的文件的问题,以便可以更有效地检索和处理文件。当前,大多数流行的云文件系统都遵循容错和可用性的原则分配数据,而文件间的关联(即相互关联的文件)通常被忽略。实际上,在实际实践中,数据文件通常以各种方式关联在一起。并且相关文件最有可能参与同一计算过程。因此,它提出了一个新的挑战,即分配具有多维关联且与“子空间局部性”相关的文件,以提高系统吞吐量。我们为存储在Cloud平台上的多维相关文件提出了两种分配方法,以便在MapReduce处理范例中提高I / O效率和数据访问局部性,而又不损害基础文件系统的容错性和可用性。与[1,2]中提出的技术不同,该技术可以快速映射给定查询({mathcal {Q}})所需数据的位置,我们专注于提高相关数据文件上批处理作业的系统吞吐量。我们明确提出了问题,并研究了有关HDFS的一系列解决方案[9]。对实际应用场景的评估证明了我们建议的有效性:在数据检索和处理过程中可以节省大量的I / O和网络成本。特别是对于批处理OLAP作业,我们的解决方案证明了分布式计算节点之间的工作负载均衡。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号