首页> 外文期刊>Parallel Computing >Hybrid storage architecture and efficient MapReduce processing for unstructured data
【24h】

Hybrid storage architecture and efficient MapReduce processing for unstructured data

机译:混合存储架构和有效的MapReduce处理非结构化数据

获取原文
获取原文并翻译 | 示例

摘要

As we are now entering the era of data deluge, how to efficiently manage these massive data is becoming a great challenge, especially for the exponentially growing unstructured data, which is far more than structured and semi-structured data. However, unstructured data is more complex for its variety. That is to say, different types of unstructured data have different file size, type and usage, which need different storage and processing for high efficiency. In this paper, we propose a hybrid storage architecture to store the pervasive unstructured data. This hybrid architecture integrates various kinds of data stores within a unified framework, where each type of unstructured data can find its suitable placement policy and it is transparent to users. In addition, we present several partitioning strategies based on the unified framework, which are beneficial to the MapReduce based batch processing for these unstructured data. The experiments demonstrate that it is possible to build an efficient and smart system through the hybrid architecture and the partitioning strategies. (C) 2017 Elsevier B.V. All rights reserved.
机译:随着我们现在进入数据泛滥的时代,如何有效地管理这些海量数据已成为一个巨大的挑战,特别是对于非结构化数据呈指数增长的情况,远远超出了结构化和半结构化数据的范围。但是,非结构化数据因其种类而更加复杂。也就是说,不同类型的非结构化数据具有不同的文件大小,类型和用途,因此需要不同的存储和处理来提高效率。在本文中,我们提出了一种混合存储架构来存储普遍的非结构化数据。这种混合体系结构将各种类型的数据存储集成在一个统一的框架中,其中每种类型的非结构化数据都可以找到其合适的放置策略,并且对用户透明。此外,我们提出了几种基于统一框架的分区策略,这些策略有利于对这些非结构化数据进行基于MapReduce的批处理。实验表明,通过混合体系结构和分区策略可以构建高效,智能的系统。 (C)2017 Elsevier B.V.保留所有权利。

著录项

  • 来源
    《Parallel Computing》 |2017年第11期|63-77|共15页
  • 作者单位

    Zhejiang Univ, Coll Comp Sci, Hangzhou, Zhejiang, Peoples R China;

    Zhejiang Univ, Coll Comp Sci, Hangzhou, Zhejiang, Peoples R China;

    Zhejiang Univ, Coll Comp Sci, Hangzhou, Zhejiang, Peoples R China;

    Zhejiang Univ, Coll Comp Sci, Hangzhou, Zhejiang, Peoples R China;

    Zhejiang Univ, Coll Comp Sci, Hangzhou, Zhejiang, Peoples R China;

    Zhejiang Univ, Coll Comp Sci, Hangzhou, Zhejiang, Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Hybrid storage; Partitioning strategy; MapReduce-based data processing;

    机译:混合存储;分区策略;基于MapReduce的数据处理;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号