【24h】

Relaxation in Text Search using Taxonomies

机译:使用分类法放松文本搜索

获取原文

摘要

In this paper we propose a novel document retrieval model in which text queries are augmented with multi-dimensional taxonomy restrictions. These restrictions may be relaxed at a cost to result quality. This new model may be applicable in many arenas, including multifaceted, product, and local search, where documents are augmented with hierarchical metadata such as topic or location. We present efficient algorithms for indexing and query processing in this new retrieval model. We decompose query processing into two sub-problems: first, an online search problem to determine the correct overall level of relaxation cost that must be incurred to generate the top k results; and second, a budgeted relaxation search problem in which all results at a particular relaxation cost must be produced at minimal cost. We show the latter problem is solvable exactly in two hierarchical dimensions, is NP-hard in three or more dimensions, but admits efficient approximation algorithms with provable guarantees. We present experimental results evaluating our algorithms on both synthetic and real data, showing order of magnitude improvements over the baseline algorithm.
机译:在本文中,我们提出了一种新颖的文档检索模型,该模型中的文本查询增加了多维分类法限制。可以放宽这些限制,但要付出一定的代价才能获得高质量的产品。这种新模型可能适用于许多领域,包括多方面,产品和本地搜索,在这些领域中,文档中添加了诸如主题或位置之类的层次化元数据。我们提出了在这种新的检索模型中用于索引和查询处理的有效算法。我们将查询处理分解为两个子问题:第一,在线搜索问题,用于确定产生前k个结果所必须产生的正确的总松弛成本水平;第二,预算松弛搜索问题,其中必须以最小成本产生以特定松弛成本得到的所有结果。我们表明,后一个问题可以在两个层次的维度上完全解决,在三个或更多维度上是NP难解的,但是可以接受具有可证明保证的有效逼近算法。我们提供了在合成数据和真实数据上评估我们的算法的实验结果,显示了相对于基线算法的数量级改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号