Search queries that retrieve text or spatial data from databases according to user requirements, are prevalent and of great importance to many applications areas such as data mining, information retrieval, pattern recognition, and clustering problems. Driven by the wide applications and requests for near real-time query response, many efforts have been devoted to the efficient support of such queries.In this thesis, we study three specific problems in this field, and explore highly efficient query processing methods by utilizing precomputed index. The first one is error-tolerant query autocompletion, which finds all data strings whose prefix edit distance to query is no larger than the given edit distance threshold. Existing methods solve the problem by incrementally maintaining a set of trie nodes (i.e. active nodes) that are within distance threshold to current query over a trie index. These methods suffer from long maintenance and result fetching time brought by the plethora of ancestor-descendant relationships among intermediate results. We propose a boundary set based solution (BEVA) that achieves the minimum active node size by completely eliminating ancestor-descendant relationships in the intermediate results, with the help of novel index structures Edit Vector Automaton (EVA) and Universal Partitioned EVA (UPEVA) that support fast edit distance computation. Our proposed UPEVA handles arbitrarily large thresholds, hence is truly universal compared with the existing Universal Deterministic Levenshtein Automata (UDLA). We conduct extensive experiments to show the superiority of our methods over existing works.The second problem is maximizing range sum (maxRS), a type of search query on spatial objects. Given a set of weighted spatial objects and a query region, the maxRS problem finds the optimal placement of the region such that the total weight of objects covered by the region is maximized. Currently, the most efficient approaches solve the problem in time O(nlogn) based on a plane-sweeping technique. This is not efficient for frequent queries with distinct parameters. We propose another solution with different space-time tradeoff, which achieves O(logn) query time by indexing changing points in the maxRS results matrix with size bounded by O(n3). Our proposed index can also be used to solve other related problems such as k-enclosing problem and maximum point enclosing problems in computational geometry.The third problem handles maxRS query on road network. Another index-based method using circle-sweeping technique is proposed, which is demonstrated to outperform existing methods around 6+ orders of magnitude with an index of size linear in data size.
展开▼