HiBase：一种基于分层式索引的高效 HBase 查询技术与系统

葛微; 罗圣美; 周文辉; 赵頔; 唐云; 周娟; 曲文武; 袁春风; 黄宜华

首页> 中文期刊> 《计算机学报》 >HiBase：一种基于分层式索引的高效 HBase 查询技术与系统

HiBase：一种基于分层式索引的高效 HBase 查询技术与系统

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Nowadays we enter the big data era.The amount of data is growing explosively in many business areas.There is an urgent need for efficient storage and management of big data to provide realtime or near-realtime query for data analysis.Hadoop HBase provides a technical method and system with excellent scalability for storing and querying big data.However,HBase only provides the row key indexing and does not support non-key indexing,which makes it insufficient to meet the need of realtime or near-realtime applications.In this paper,we proposed a hierarchical secondary indexing model and method for HBase.It built the permanent layer of secondary index for non-key columns in HBase table to speed up the query process.Furthermore, we presented the Hotscore Algorithm with hot-index cache mechanisms and an efficient cache replacement policy,to reduce the disk access overhead for index data.The Hotscore Algorithm overcame the limitations of the Least Recently Used (LRU)policy.To differentiate the hot and cold index data more precisely and fit in the time locality of data accesses,the Hotscore Algorithm presented a new method by accumulating the access frequency of records and reducing the accumulation variable exponentially and periodically.Additionally,we designed the distributed memory cache protocol based on consistent hashing to provide excellent scalability for the hot-index cache layer.Finally,we implemented a hierarchical indexing system HiBase.The experi-mental results on datasets ranging from 10 million to one billion records show that,the HiBase cold query(the cache-missed query)outperforms the standard HBase by 65 times (for large result sets)to more than 3000 times (for small result sets)respectively.Further,the HiBase hot query (the cache-hit query)after adopting the Hotscore Algorithm cache replacement policy can achieve extra 5—15 times speedup compared to the HiBase cold query,making the overall performance speedup more than 300 times (for large result sets)to 17 000 times (for small result sets)compared to the standard HBase and speedup 5—20 times compared to the open-source Hindex secondary indexing system.%大数据时代，众多应用领域的数据量爆炸式增长，迫切需要研究和寻找有效的大数据存储管理方法，提供实时或准实时的大数据查询分析能力．Hadoop HBase 系统为大数据的存储管理提供了一种具有高可扩展性的技术方法和系统平台．然而 HBase 只有主键索引，不支持非主键索引，这导致 HBase 的数据查询效率较低，难以满足数据实时或准实时查询需求．为此，在 HBase 基础上提供面向非主键的快速查询能力，是目前 Hadoop 环境下急需研究和解决的一个重要问题．该文研究提出了一种基于分层式 HBase 非主键索引的查询模型和方法，该模型和方法首先建立基于 HBase 的持久性索引．然后，为了利用内存提升查询性能，该文进一步提出了一种索引热点数据缓存技术和一种高效的热度累积缓存替换策略，以降低对 HBase 索引表的磁盘访问开销．热度累积缓存替换策略克服了最近最少使用（LRU）算法的局限性，考虑数据访问的累积热度和时间局部特性，从而更准确地捕获数据访问的特征．为了使索引热点数据缓存内存层具有良好的可扩展性，HiBase 设计了基于一致性哈希的分布式内存缓存，支持高效的基于非主键的单点查询和范围查询．最终，该文设计实现了完整的分层式索引和查询系统 HiBase．在千万至十亿条记录规模数据集上的测试结果表明，HiBase 冷查询响应时间比标准 HBase 快65倍（大结果集）到3000多倍（小结果集）；而引入基于查询热度累积算法的内存索引缓存方法后，热查询性能可在 HiBase 冷查询基础上再提升5～15倍，使得总体查询性能比标准 HBase 快300多倍（大结果集）到1．7万倍（小结果集），比开源的 Hindex 系统快5～20倍．

著录项

来源
《计算机学报》 |2016年第1期|140-153|共14页
作者
葛微; 罗圣美; 周文辉; 赵頔; 唐云; 周娟; 曲文武; 袁春风; 黄宜华;
展开▼
作者单位

南京大学计算机软件新技术国家重点实验室南京 210046;

江苏省软件新技术与产业化协同创新中心南京 210046;

中兴通讯股份有限公司南京 210012;

清华大学计算机科学与技术系北京 100084;

南京大学计算机软件新技术国家重点实验室南京 210046;

江苏省软件新技术与产业化协同创新中心南京 210046;

南京大学计算机软件新技术国家重点实验室南京 210046;

江苏省软件新技术与产业化协同创新中心南京 210046;

南京大学计算机软件新技术国家重点实验室南京 210046;

江苏省软件新技术与产业化协同创新中心南京 210046;

南京大学计算机软件新技术国家重点实验室南京 210046;

江苏省软件新技术与产业化协同创新中心南京 210046;

中兴通讯股份有限公司南京 210012;

南京大学计算机软件新技术国家重点实验室南京 210046;

江苏省软件新技术与产业化协同创新中心南京 210046;

南京大学计算机软件新技术国家重点实验室南京 210046;

江苏省软件新技术与产业化协同创新中心南京 210046;

展开▼
原文格式 PDF
正文语种 chi
中图分类程序设计、软件工程;
关键词
HBase; 非主键索引; 查询处理; 分层式索引; 缓存替换策略; 大数据;

相似文献

中文文献
外文文献
专利

1. 基于HBase与静态多级格网索引的地表覆盖数据高效检索方法 [J] . 祝琳莹 ,张丰 ,杜震洪 . 浙江大学学报：理学版 . 2018,第5期
2. HOS:一种基于HBase的分布式存储系统设计与实现 [J] . 季一木 ,张宁 ,尧海昌 . 南京邮电大学学报（自然科学版） . 2019,第005期
3. 一种基于TwemProxy的HBase索引缓存方案 [J] . 瞿龙俊 ,李星毅 . 信息技术 . 2017,第010期
4. 一种基于Solr的HBase海量数据二级索引方案 [J] . 王文贤 ,陈兴蜀 ,王海舟 . 信息网络安全 . 2017,第008期
5. 基于Hadoop和HBase的分布式索引模型的研究 [J] . 施磊磊 ,施化吉 ,束长波 . 信息技术 . 2015,第006期
6. 一种基于HBase的高效空间关键字查询策略 [C] . ZHANG Yu ,张榆 ,MA You-zhong . 2012中国计算机大会 . 2012
7. 面向时空数据的HBase索引与查询技术研究 [A] . 邹喆 . 2020

HiBase：一种基于分层式索引的高效 HBase 查询技术与系统

摘要

著录项

相似文献

相关主题

期刊订阅