首页> 外文OA文献 >Summarization of very large spatial dataset
【2h】

Summarization of very large spatial dataset

机译:超大型空间数据集的总结

摘要

Nowadays there are a large number of applications, such as digital library information retrieval, business data analysis, CAD/CAM, multimedia applications with images and sound, real-time process control and scientific computation, with data sets about gigabytes, terabytes or even petabytes. Because data distributions are too large to be stored accurately, maintaining compact and accurate summarized information about underlying data is of crucial important.The summarizing problem for Level 1 (disjoint and non-disjoint) topological relationship has been well studied for the past few years. However the spatial database users are often interested in a much richer set of spatial relations such as contains. Little work has been done on summarization for Level 2 topological relationship which includes contains, contained, overlap, equal and disjoint relations.We study the problem of effective summatization to represent the underlying data distribution to answer window queries for Level 2 topologicalrelationship. Cell-density based approach has been demonstrated as an effective way to this problem. But the challenges are the accuracy of the results and the storage space required which should be linearly proportional to the number of cells to be practical.In this thesis, we present several novel techniques to effectively construct cell density based spatial histograms. Based on the framework proposed, exact results could be obtained in constant time for aligned window queries. To minimize the storage space of the framework, an approximate algorithm with the approximate ratio 19/12 is presented, while the problem is shown NP-hard generally. Because the framework requires only a storage space linearly proportional to the number of cells, it is practical for many popularreal datasets. To conform to a limited storage space, effective histogram construction and query algorithms are proposed which can provide approximate resultsbut with high accuracy. The problem for non-aligned window queries is also investigated and techniques of un-even partitioned space are developed to support non-aligned window queries. Finally, we extend our techniques to 3D space. Our extensive experiments against both synthetic and real world datasets demonstrate the efficiency of the algorithms developed in this thesis.
机译:如今,有大量应用程序,例如数字图书馆信息检索,业务数据分析,CAD / CAM,具有图像和声音的多媒体应用程序,实时过程控制和科学计算,数据集的大小约为千兆字节,太字节甚至PB。 。由于数据分布太大而无法准确存储,因此维护有关基础数据的紧凑而准确的摘要信息至关重要。过去几年来,对1级(不相交和不相交)拓扑关系的汇总问题进行了深入研究。但是,空间数据库用户通常对一组更丰富的空间关系(例如包含)感兴趣。关于2级拓扑关系的汇总(包括包含,包含,重叠,相等和不交集的关系)的工作很少进行。我们研究了有效汇总来表示底层数据分布的问题,以回答2级拓扑关系的窗口查询。基于细胞密度的方法已被证明是解决此问题的有效方法。但是挑战在于结果的准确性和所需的存储空间,这些空间应与实际的细胞数量成线性比例。在本文中,我们提出了几种有效构建基于细胞密度的空间直方图的新技术。基于提出的框架,可以在恒定时间内获得精确的结果,以进行对齐的窗口查询。为了最大程度地减少框架的存储空间,提出了一种近似比率为19/12的近似算法,而问题通常表现为NP-hard。由于该框架仅需要一个与单元格数量成线性比例的存储空间,因此对于许多流行的实际数据集而言,它是实用的。为了适应有限的存储空间,提出了有效的直方图构造和查询算法,可以提供近似的结果,但精度较高。还研究了非对齐窗口查询的问题,并开发了不均匀分区空间的技术来支持非对齐窗口查询。最后,我们将技术扩展到3D空间。我们针对合成数据集和现实世界数据集进行的大量实验证明了本文开发的算法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号