首页> 外文期刊>Journal of Intelligent Information Systems >Token list based information search in a multi-dimensional massive database
【24h】

Token list based information search in a multi-dimensional massive database

机译:多维海量数据库中基于令牌列表的信息搜索

获取原文
获取原文并翻译 | 示例
           

摘要

Finding proximity information is crucial for massive database search. Locality Sensitive Hashing (LSH) is a method for finding nearest neighbors of a query point in a high-dimensional space. It classifies high-dimensional data according to data similarity. However, the "curse of dimensionality" makes LSH insufficiently effective in finding similar data and insufficiently efficient in terms of memory resources and search delays. The contribution of this work is threefold. First, we study a Token List based information Search scheme (TLS) as an alternative to LSH. TLS builds a token list table containing all the unique tokens from the database, and clusters data records having the same token together in one group. Querying is conducted in a small number of groups of relevant data records instead of searching the entire database. Second, in order to decrease the searching time of the token list, we further propose the Optimized Token list based Search schemes (OTS) based on index-tree and hash table structures. An index-tree structure orders the tokens in the token list and constructs an index table based on the tokens. Searching the token list starts from the entry of the token list supplied by the index table. A hash table structure assigns a hash ID to each token. A query token can be directly located in the token list according to its hash ID. Third, since a single-token based method leads to high overhead in the results refinement given a required similarity, we further investigate how a Multi-Token List Search scheme (MTLS) improves the performance of database proximity search. We conducted experiments on the LSH-based searching scheme, TLS, OTS, and MTLS using a massive customer data integration database. The comparison experimental results show that TLS is more efficient than an LSH-based searching scheme, and OTS improves the search efficiency of TLS. Further, MTLS per forms better than TLS when the number of tokens is appropriately chosen, and a two-token adjacent token list achieves the shortest query delay in our testing dataset.
机译:查找邻近信息对于大规模数据库搜索至关重要。局部敏感哈希(LSH)是一种用于在高维空间中查找查询点的最近邻居的方法。它根据数据相似性对高维数据进行分类。但是,“维数的诅咒”使LSH在寻找相似数据方面不够有效,在存储资源和搜索延迟方面也不够有效。这项工作的贡献是三方面的。首先,我们研究基于令牌列表的信息搜索方案(TLS)作为LSH的替代方案。 TLS构建一个包含来自数据库的所有唯一令牌的令牌列表表,并将具有相同令牌的数据记录聚集在一起。查询是在少量相关数据记录的组中进行的,而不是搜索整个数据库。其次,为了减少令牌列表的搜索时间,我们进一步提出了基于索引树和哈希表结构的基于优化令牌列表的搜索方案(OTS)。索引树结构对令牌列表中的令牌进行排序,并基于令牌构建索引表。从索引表提供的令牌列表的条目开始搜索令牌列表。哈希表结构将哈希ID分配给每个令牌。查询令牌可以根据其哈希ID直接位于令牌列表中。第三,由于基于单令牌的方法在给定所需相似性的情况下导致结果细化的高开销,因此我们进一步研究了多令牌列表搜索方案(MTLS)如何提高数据库邻近搜索的性能。我们使用庞大的客户数据集成数据库对基于LSH的搜索方案,TLS,OTS和MTLS进行了实验。比较实验结果表明,TLS比基于LSH的搜索方案更有效,而OTS可以提高TLS的搜索效率。此外,当适当选择令牌数量时,MTLS的形式优于TLS,并且两个令牌的相邻令牌列表在我们的测试数据集中实现了最短的查询延迟。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号