首页> 外文期刊>Information retrieval >Identifying top relevant dates for implicit time sensitive queries
【24h】

Identifying top relevant dates for implicit time sensitive queries

机译:确定隐式时间敏感查询的最相关日期

获取原文
获取原文并翻译 | 示例
           

摘要

Despite a clear improvement of search and retrieval temporal applications, current search engines are still mostly unaware of the temporal dimension. Indeed, in most cases, systems are limited to offering the user the chance to restrict the search to a particular time period or to simply rely on an explicitly specified time span. If the user is not explicit in his/her search intents (e.g., "philip seymour hoffman'') search engines may likely fail to present an overall historic perspective of the topic. In most such cases, they are limited to retrieving the most recent results. One possible solution to this shortcoming is to understand the different time periods of the query. In this context, most state-of-the-art methodologies consider any occurrence of temporal expressions in web documents and other web data as equally relevant to an implicit time sensitive query. To approach this problem in a more adequate manner, we propose in this paper the detection of relevant temporal expressions to the query. Unlike previous metadata and query log-based approaches, we show how to achieve this goal based on information extracted from document content. However, instead of simply focusing on the detection of the most obvious date we are also interested in retrieving the set of dates that are relevant to the query. Towards this goal, we define a general similarity measure that makes use of co-occurrences of words and years based on corpus statistics and a classification methodology that is able to identify the set of top relevant dates for a given implicit time sensitive query, while filtering out the non-relevant ones. Through extensive experimental evaluation, we mean to demonstrate that our approach offers promising results in the field of temporal information retrieval (T-IR), as demonstrated by the experiments conducted over several baselines on web corpora collections.
机译:尽管搜索和检索时间应用程序有了明显的改进,但是当前的搜索引擎仍大多不了解时间维度。实际上,在大多数情况下,系统仅限于向用户提供将搜索限制到特定时间段或仅依赖明确指定的时间跨度的机会。如果用户的搜索意图不明确(例如“ philip seymour hoffman”),搜索引擎可能无法展示该主题的整体历史观点,在大多数情况下,它们仅限于检索最新的解决此缺点的一种可能的解决方案是了解查询的不同时间段,在这种情况下,大多数最新方法都将Web文档和其他Web数据中出现的时间表达式视为与隐式时间敏感查询。为了更充分地解决这个问题,我们提出了对查询相关的时态表达式的检测。与以前的元数据和基于查询日志的方法不同,我们展示了如何基于信息来实现这一目标。从文档内容中提取出来的信息,但是,我们不仅仅关注于最明显的日期的检测,我们还对检索与查询相关的日期集感兴趣。为实现这一目标,我们定义了一种通用相似性度量,该度量基于语料库统计信息使用了单词和年份的共现,并且该分类方法能够为给定的隐式时间敏感查询确定最相关的日期集,同时过滤掉不相关的。通过广泛的实验评估,我们旨在证明我们的方法在时态信息检索(T-IR)领域中提供了令人鼓舞的结果,如在Web语料库集合的多个基准上进行的实验所证明的那样。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号