首页> 外文会议>Data Engineering, ICDE, 2009 IEEE 25th International Conference on >Weighted Proximity Best-Joins for Information Retrieval
【24h】

Weighted Proximity Best-Joins for Information Retrieval

机译:信息检索的加权邻近最佳联接

获取原文

摘要

We consider the problem of efficiently computing weighted proximity best-joins over multiple lists, with applications in information retrieval and extraction. We are given a multi-term query, and for each query term, a list of all its matches with scores, sorted by locations. The problem is to find the overall best matchset, consisting of one match from each list, such that the combined score according to a scoring function is maximized. We study three types of functions that consider both individual match scores and proximity of match locations in scoring a matchset. We present algorithms that exploit the properties of the scoring functions in order to achieve time complexities linear in the size of the match lists. Experiments show that these algorithms greatly outperform the naive algorithm based on taking the cross product of all match lists. Finally, we extend our algorithms for an alternative problem definition applicable to information extraction, where we need to find all good matchsets in a document.
机译:我们考虑在信息检索和提取中的应用有效地计算多个列表上的加权邻近最佳连接的问题。我们为您提供了一个多词查询,并且为每个查询词提供了所有与分数匹配的列表,并按位置进行了排序。问题是要找到总体最佳的匹配集,该匹配集由每个列表中的一个匹配组成,以使根据得分函数的组合得分最大化。我们研究了三种类型的函数,这些函数在计分比赛集时同时考虑了个人比赛得分和比赛位置的接近性。我们提出了利用评分函数属性的算法,以使时间复杂度与匹配列表的大小成线性关系。实验表明,基于所有匹配列表的叉积,这些算法大大优于单纯算法。最后,我们将算法扩展为适用于信息提取的替代问题定义,其中我们需要在文档中查找所有良好的匹配集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号