首页> 外文期刊>Parallel Algorithms and Applications >Concentric layout, a new scientific data layout for matrix data-set in Hadoop file system
【24h】

Concentric layout, a new scientific data layout for matrix data-set in Hadoop file system

机译:同心布局,一种用于Hadoop文件系统中矩阵数据集的新科学数据布局

获取原文
获取原文并翻译 | 示例

摘要

Due to the explosive growth in the size of scientific data-sets, data-intensive computing and analysing are an emerging trend in computational science. In these applications, data pre-processing is widely adopted because it can optimise the data layout or format beforehand to facilitate the future data access. On the other hand, current research shows an increasing popularity of MapReduce framework for large-scale data processing. However, the data access patterns which are generally applied to scientific data-set are not supported by current MapReduce framework directly. This gap motivates us to provide support for these scientific data access patterns in MapReduce framework. In our work, we study the data access patterns in matrix files and propose a new concentric data layout solution to facilitate matrix data access and analysis in MapReduce framework. Concentric data layout is a data layout which maintains the dimensional property in chunk level. Contrary to the continuous data layout adopted in the current Hadoop framework, concentric data layout stores the data from the same sub-matrix into one chunk. This layout can guarantee that the average performance of data access is optimal regardless of the various access patterns. The concentric data layout requires reorganising the data before it is being analysed or processed. Our experiments are launched on a real-world halo-finding application; the results indicate that the concentric data layout improves the overall performance by up to 38%.
机译:由于科学数据集规模的爆炸性增长,数据密集型计算和分析是计算科学的新兴趋势。在这些应用中,数据预处理被广泛采用,因为它可以预先优化数据布局或格式以方便将来的数据访问。另一方面,当前的研究表明,用于大规模数据处理的MapReduce框架越来越受欢迎。但是,当前的MapReduce框架不直接支持通常应用于科学数据集的数据访问模式。这种差距促使我们为MapReduce框架中的这些科学数据访问模式提供支持。在我们的工作中,我们研究矩阵文件中的数据访问模式,并提出一种新的同心数据布局解决方案,以促进MapReduce框架中矩阵数据的访问和分析。同心数据布局是一种在块级别上保持Dimension属性的数据布局。与当前Hadoop框架中采用的连续数据布局相反,同心数据布局将来自同一子矩阵的数据存储到一个块中。这种布局可以确保无论各种访问模式如何,数据访问的平均性能都是最佳的。同心数据布局要求在分析或处理数据之前重新组织数据。我们的实验是在现实世界中的光环发现应用程序上启动的;结果表明,同心数据布局可将整体性能提高38%。

著录项

  • 来源
    《Parallel Algorithms and Applications》 |2013年第5期|407-433|共27页
  • 作者

    Jun Wang; Lu Cheng; Lizhe Wang;

  • 作者单位

    Department of Electrical Engineering and Computer Science, University of Central Florida, 4000 Central Florida Blvd, Orlando, FL 32816, USA;

    Department of Electrical Engineering and Computer Science, University of Central Florida, 4000 Central Florida Blvd, Orlando, FL 32816, USA;

    Centerfor Earth Observation and Digital Earth, Chinese Academy of Sciences, Beijing 100094, P.R. China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    data access pattern; Hadoop distributed file system; matrix file; data layout;

    机译:数据访问模式;Hadoop分布式文件系统;矩阵文件数据布局;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号