首页> 外文期刊>Future generation computer systems >A MapReduce-based scalable discovery and indexing of structured big data
【24h】

A MapReduce-based scalable discovery and indexing of structured big data

机译:基于MapReduce的可伸缩的结构化大数据发现和索引

获取原文
获取原文并翻译 | 示例

摘要

Various methods and techniques have been proposed in past for improving performance of queries on structured and unstructured data. The paper proposes a parallel B-Tree index in the MapReduce framework for improving efficiency of random reads over the existing approaches. The benefit of using the MapReduce framework is that it encapsulates the complexity of implementing parallelism and fault tolerance from users and presents these in a user friendly way. The proposed index reduces the number of data accesses for range queries and thus improves efficiency. The B-Tree index on MapReduce is implemented in a chained-MapReduce process that reduces intermediate data access time between successive map and reduce functions, and improves efficiency. Finally, five performance metrics have been used to validate the performance of proposed index for range search query in MapReduce, such as, varying cluster size and, size of range search query coverage on execution time, the number of map tasks and size of Input/Output (I/O) data. The effect of varying Hadoop Distributed File System (HDFS) block size and, analysis of the size of heap memory and intermediate data generated during map and reduce functions also shows the superiority of the proposed index. It is observed through experimental results that the parallel B-Tree index along with a chained-MapReduce environment performs better than default non-indexed dataset of the Hadoop and B-Tree like Global Index (Zhao et al., 2012) in MapReduce.
机译:过去已经提出了各种方法和技术来提高对结构化和非结构化数据的查询性能。本文提出了MapReduce框架中的并行B树索引,以提高现有方法的随机读取效率。使用MapReduce框架的好处在于,它封装了实现用户并行性和容错能力的复杂性,并以用户友好的方式呈现这些内容。提出的索引减少了范围查询的数据访问次数,从而提高了效率。 MapReduce上的B树索引是在链式MapReduce流程中实现的,该流程减少了连续地图之间的中间数据访问时间并减少了功能,并提高了效率。最后,五个性能指标已用于验证MapReduce中范围搜索查询的建议索引的性能,例如,变化的集群大小和范围,执行时间上的范围搜索查询覆盖范围的大小,映射任务的数量和Input /输出(I / O)数据。变化的Hadoop分布式文件系统(HDFS)块大小以及对堆内存大小和map和reduce函数期间生成的中间数据的分析的影响也显示了所提出索引的优越性。通过实验结果可以看出,并行B-Tree索引以及链式MapReduce环境的性能优于MapReduce中的Hadoop和B-Tree的默认非索引数据集,如Global Index(Zhao等人,2012)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号