...
首页> 外文期刊>Data in Brief >Event-Dataset: Temporal information retrieval and text classification dataset
【24h】

Event-Dataset: Temporal information retrieval and text classification dataset

机译:Event-DataSet:时间信息检索和文本分类数据集

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Recently, Temporal Information Retrieval (TIR) has grabbed the major attention of the information retrieval community. TIR exploits the temporal dynamics in the information retrieval process and harnesses both textual relevance and temporal relevance to fulfill the temporal information requirements of a user Ur Rehman Khan et?al., 2018. The focus time of document is an important temporal aspect which is defined as the time to which the content of the document refers Jatowt et?al., 2015; Jatowt et?al., 2013; Morbidoni et?al., 2018, Khan et?al., 2018. To the best of our knowledge, there does not exist any standard benchmark data set (publicly available) that holds the potential to comprehensively evaluate the performance of focus time assessment strategies. Considering these aspects, we have produced the Event-dataset, which is comprised of 35 queries and set of news articles for each query. Such that,C={Qs,Ds},where C represents the dataset,Qsis query setQs={q1,q2,q3,…….,q35}and for eachqithere is a set of news articlesqi={dr,dnr}.dr,dnrare sets of relevant documents and non-relevant documents respectively. Each query in the dataset represents a popular event. To annotate these articles into relevant and non-relevant, we have employed a user-study based evaluation method wherein a group of postgraduate students manually annotate the articles into the aforementioned categories. We believe that the generation of such dataset can provide an opportunity for the information retrieval researchers to use it as a benchmark to evaluate focus time assessment methods specifically and information retrieval methods generically.
机译:最近,时间信息检索(TIR)抓住了信息检索社区的主要关注。 TIR在信息检索过程中利用时间动态,并利用文本相关性和时间相关性,以满足用户rehman khan et?al的时间信息要求。,2018.文件的焦点时间是定义的重要时间方面作为文档内容指的是Jatowt et?al。,2015; jatowt et?al。,2013; Morbidoni et?al。,2018,khan et?al。,2018年。据我们所知,不存在任何标准基准数据集(公开可用),该数据集(可公开)持有潜力,以全面评估焦点时间评估策略的绩效。考虑到这些方面,我们制作了Event-DataSet,该数据集由每个查询组成的35个查询和一组新闻文章。这样C = {QS,DS},其中C代表数据集,QSIS查询SERISQS = {Q1,Q2,Q3,......,Q35}和每个预测是一组新闻艺术品QI = {DR,DNR}。 DR,DNRARE分别有相关文件和非相关文件。 DataSet中的每个查询表示流行的事件。为了将这些文章注释为相关和无关,我们使用了一项基于用户研究的评估方法,其中一组研究生手动将物品注释为上述类别。我们认为,此类数据集的生成可以为信息检索研究人员提供作为基准来评估焦点时间评估方法的基准,并专门从属于提供信息检索方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号