首页> 外文会议>International conference on algorithms and architectures for parallel processing >LuBase: A Search-Efficient Hybrid Storage System for Massive Text Data
【24h】

LuBase: A Search-Efficient Hybrid Storage System for Massive Text Data

机译:LuBase:用于海量文本数据的高效搜索混合存储系统

获取原文
获取外文期刊封面目录资料

摘要

Recent years have witnessed a great deal of enthusiasm devoting to big data analytics systems, some of them, with the property of high scalability and fault tolerance, are extensively used in real productions. However, such systems are mostly designed for processing immutable data stored in HDFS, not suitable for real-time text data in NoSQL database like HBase. In this paper, we propose a search-efficient hybrid storage system termed LuBase for large-scale text data analytics scenarios. Not just a novel hybrid storage system with fine-grained index, LuBase also presents a new query process flow which can fully employ pre-built full-text index to accelerate the execution of interactive queries and achieve more efficient I/O performance at the same time. We implemented LuBase in a data analytics system based on Impala. Experimental results demonstrate that LuBase can reap huge fruits from Lucene index technique and bring significant performance improvement for Impala when querying HBase.
机译:近年来,人们对大数据分析系统表现出了极大的热情,其中一些具有高可伸缩性和容错性的特性已在实际生产中广泛使用。但是,此类系统主要用于处理存储在HDFS中的不可变数据,不适用于像HBase这样的NoSQL数据库中的实时文本数据。在本文中,我们针对大型文本数据分析方案提出了一种称为LuBase的搜索有效混合存储系统。 LuBase不仅是具有细粒度索引的新型混合存储系统,还提供了一种新的查询处理流程,该流程可以充分利用预建的全文索引来加速交互式查询的执行,并同时实现更有效的I / O性能。时间。我们在基于Impala的数据分析系统中实现了LuBase。实验结果表明,LuBase可以从Lucene索引技术中收获巨大成果,并在查询HBase时为Impala带来显着的性能改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号