大数据环境下支持概率数据范围查询索引的研究

朱睿; 王斌; 杨晓春; 王国仁

首页> 中文期刊> 《计算机学报》 >大数据环境下支持概率数据范围查询索引的研究

大数据环境下支持概率数据范围查询索引的研究

AI论文写作 >>

AI期刊论文写作 >>

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

With the increasing of data scale,big data management is great significant.Underlying the popular mathematical models,probabilistic model is suitable for big data management since it could compress volume of data into a few probabilistic data.Therefore,it is significant for studying the problem of probabilistic data management over big data environment.As a classic query,range query over probabilistic data has been fully studied.However,the state of art efforts are not suitable since they all suffer from highly updating cost.In this paper,we propose a novel index named HGD-Tree for solving this problem.First of all,we propose a group of novel strategies for handling newly arrival objects.In this way,we could efficiently apply the insertion,deletion, and updating on the premise of balancing tree structure.In addition,we propose a novel partition-based structure to approach the probability density function of object,where the structure could self-adjust the partition resolution so as to cater for the underlying of uncertain data.Besides,our proposed structure is expressed by a few bit vectors.The above two strategies guarantee low space cost of the proposed index.Last but not least,we propose a novel algorithm for supporting the range query which could effectively apply the pruning under few bitwise operations.Theoretical analysis and extensive experimental results demonstrate the effectiveness of the proposed algorithms.%随着数据规模的不断增长，大数据管理具有重要意义．在众多数学模型中，因为概率模型可以将海量数据抽象成少量概率数据，所以它非常适合管理大数据．因此，研究大数据环境下的概率数据管理具有重要意义．作为一种经典查询，基于概率数据的范围查询已被深入研究．然而，当前研究成果不适合在大数据环境下使用．其根本原因是这些索引的更新代价较大．该文提出了索引 HGD-Tree 解决这一问题．首先，该文提出了一系列算法降低新增数据的处理代价．它可以保证树结构平衡的前提下快速地执行插入、删除、更新等操作．其次，该文提出了一种基于划分的方法构建概率对象的概要信息．它可以根据概率密度函数的特点自适应地执行划分．此外，由于作者提出的概要是基于比特向量，上述策略可以保证索引以较低空间代价管理概率数据．最后，该文提出了一种基于位运算的方法访问 HGD-Tree．它可以用少量的位运算执行过滤操作．大量的实验验证了算法的有效性．

著录项

来源
《计算机学报》 |2016年第10期|1929-1946|共18页
作者
朱睿; 王斌; 杨晓春; 王国仁;
展开▼
作者单位

东北大学信息科学与工程学院沈阳 110004;

东北大学信息科学与工程学院沈阳 110004;

东北大学信息科学与工程学院沈阳 110004;

东北大学信息科学与工程学院沈阳 110004;

展开▼
原文格式 PDF
正文语种 chi
中图分类程序设计、软件工程;
关键词
大数据; 概率数据; 索引; 概率概要信息; 多分辨率网格;

相似文献

中文文献
外文文献
专利

1. 云计算环境下支持复杂查询的多维数据索引机制 [J] . 朱夏 ,罗军舟 ,宋爱波 . 计算机研究与发展 . 2013,第8期
2. 一种支持多维数据范围查询的对等计算索引框架 [J] . 张蓉 ,钱卫宁 ,周傲英 . 计算机研究与发展 . 2009,第4期
3. 一种支持范围查询的云数据空间索引研究 [J] . 李剑锋 ,陈世平 ,段林茂 . 小型微型计算机系统 . 2018,第5期
4. MapReduce环境下支持精确查询的嵌套式数据索引技术 [J] . 彭敦陆 ,王丽 ,霍欢 . 小型微型计算机系统 . 2015,第2期
5. 大数据支持下的环境保护——评《基于大数据技术的环境可持续发展保护研究》 [J] . 逯超 ,陈胜 ,郑雅文 . 环境工程 . 2021,第4期
6. 混合MapReduce环境下大数据划分的查询优化 [C] . LI Fu ,李伏 ,ZHU Qing . 第29届中国数据库学术会议 . 2012
7. 大数据下空间数据索引和kNN查询技术的研究 [A] . 董亭亭 . 2013

大数据环境下支持概率数据范围查询索引的研究

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅