【24h】

Keyword Query Cleaning

机译:关键字查询清理

获取原文

摘要

Unlike traditional database queries, keyword queries do not adhere to predefined syntax and are often dirty with irrelevant words from natural languages. This makes accurate and efficient keyword query processing over databases a very challenging task.In this paper, we introduce the problem of query cleaning for keyword search queries in a database context and propose a set of effective and efficient solutions. Query cleaning involves semantic linkage and spelling corrections of database relevant query words, followed by segmentation of nearby query words such that each segment corresponds to a high quality data term. We define a quality metric of a keyword query, and propose a number of algorithms for cleaning keyword queries optimally. It is demonstrated that the basic optimal query cleaning problem can be solved using a dynamic programming algorithm. We further extend the basic algorithm to address incremental query cleaning and top-k optimal query cleaning. The incremental query cleaning is efficient and memory-bounded, hence is ideal for scenarios in which the keywords are streamed. The top-k query cleaning algorithm is guaranteed to return the best k cleaned keyword queries in ranked order. Extensive experiments are conducted on three real-life data sets, and the results confirm the effectiveness and efficiency of the proposed solutions.
机译:与传统的数据库查询不同,关键字查询不遵循预定义的语法,并且经常混入来自自然语言的不相关词。这使得在数据库上进行准确,高效的关键字查询处理成为一项非常艰巨的任务。 在本文中,我们介绍了数据库上下文中关键字搜索查询的查询清理问题,并提出了一套有效而有效的解决方案。查询清理涉及数据库相关查询词的语义链接和拼写更正,然后对附近的查询词进行分段,以使每个分段都对应一个高质量的数据项。我们定义了关键字查询的质量指标,并提出了多种用于优化清理关键字查询的算法。证明了使用动态规划算法可以解决基本的最佳查询清除问题。我们进一步扩展了基本算法,以解决增量查询清除和top-k最佳查询清除。增量查询清除效率高且内存有限,因此非常适合流式传输关键字的方案。前k个查询清除算法保证按排序顺序返回最佳的k个清除关键字查询。在三个真实的数据集上进行了广泛的实验,结果证实了所提出解决方案的有效性和效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号