首页> 外文OA文献 >Index-based Query Processing Methods for Text and Spatial Databases
【2h】

Index-based Query Processing Methods for Text and Spatial Databases

机译:文本和空间数据库的基于索引的查询处理方法

摘要

Search queries that retrieve text or spatial data from databases according to user requirements, are prevalent and of great importance to many applications areas such as data mining, information retrieval, pattern recognition, and clustering problems. Driven by the wide applications and requests for near real-time query response, many efforts have been devoted to the efficient support of such queries.In this thesis, we study three specific problems in this field, and explore highly efficient query processing methods by utilizing precomputed index. The first one is error-tolerant query autocompletion, which finds all data strings whose prefix edit distance to query is no larger than the given edit distance threshold. Existing methods solve the problem by incrementally maintaining a set of trie nodes (i.e. active nodes) that are within distance threshold to current query over a trie index. These methods suffer from long maintenance and result fetching time brought by the plethora of ancestor-descendant relationships among intermediate results. We propose a boundary set based solution (BEVA) that achieves the minimum active node size by completely eliminating ancestor-descendant relationships in the intermediate results, with the help of novel index structures Edit Vector Automaton (EVA) and Universal Partitioned EVA (UPEVA) that support fast edit distance computation. Our proposed UPEVA handles arbitrarily large thresholds, hence is truly universal compared with the existing Universal Deterministic Levenshtein Automata (UDLA). We conduct extensive experiments to show the superiority of our methods over existing works.The second problem is maximizing range sum (maxRS), a type of search query on spatial objects. Given a set of weighted spatial objects and a query region, the maxRS problem finds the optimal placement of the region such that the total weight of objects covered by the region is maximized. Currently, the most efficient approaches solve the problem in time O(nlogn) based on a plane-sweeping technique. This is not efficient for frequent queries with distinct parameters. We propose another solution with different space-time tradeoff, which achieves O(logn) query time by indexing changing points in the maxRS results matrix with size bounded by O(n3). Our proposed index can also be used to solve other related problems such as k-enclosing problem and maximum point enclosing problems in computational geometry.The third problem handles maxRS query on road network. Another index-based method using circle-sweeping technique is proposed, which is demonstrated to outperform existing methods around 6+ orders of magnitude with an index of size linear in data size.
机译:根据用户需求从数据库检索文本或空间数据的搜索查询非常普遍,并且对于许多应用程序领域(例如数据挖掘,信息检索,模式识别和聚类问题)非常重要。在近乎实时的查询响应的广泛应用和要求的推动下,人们为有效支持此类查询做出了许多努力。本文研究了该领域中的三个具体问题,并探索了高效的查询处理方法。预计算索引。第一个是容错查询自动补全功能,它查找要查询的前缀编辑距离不大于给定编辑距离阈值的所有数据字符串。现有方法通过增量地维护在特里索引上的当前查询的距离阈值以内的特里节点(即活动节点)的集合来解决该问题。这些方法维护时间长,并且由于中间结果之间存在过多的祖先后代关系而导致获取结果的时间变长。我们提出了一种基于边界集的解决方案(BEVA),该解决方案通过在新颖的索引结构Edit Vector Automaton(EVA)和Universal Partitioned EVA(UPEVA)的帮助下,通过完全消除中间结果中的祖先后代关系来实现最小活动节点大小。支持快速编辑距离计算。我们提出的UPEVA可处理任意大的阈值,因此与现有的通用确定性Levenshtein自动机(UDLA)相比,它确实具有通用性。我们进行了广泛的实验,以证明我们的方法优于现有作品。第二个问题是最大化范围和(maxRS),这是一种对空间物体的搜索查询。给定一组加权空间对象和一个查询区域,maxRS问题将找到该区域的最佳位置,以使该区域所覆盖的对象的总权重最大化。当前,最有效的方法是基于平面扫描技术来解决时间O(nlogn)的问题。这对于具有不同参数的频繁查询而言效率不高。我们提出了另一个具有不同时空权衡的解决方案,该解决方案通过索引maxRS结果矩阵中以O(n3)为边界的变化点来实现O(logn)查询时间。我们提出的索引还可以用于解决其他相关问题,例如计算几何中的k封闭问题和最大点封闭问题。第三个问题处理道路网络上的maxRS查询。提出了另一种使用圆扫技术的基于索引的方法,该方法被证明在数据大小为线性的情况下优于6+个数量级的现有方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号