Event-Dataset: Temporal information retrieval and text classification dataset

Shafiq Ur Rehman Khan; Muhammad Arshad Islam

首页> 外文期刊>Data in Brief >Event-Dataset: Temporal information retrieval and text classification dataset

【24h】

Event-Dataset: Temporal information retrieval and text classification dataset

机译：Event-DataSet：时间信息检索和文本分类数据集

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recently, Temporal Information Retrieval (TIR) has grabbed the major attention of the information retrieval community. TIR exploits the temporal dynamics in the information retrieval process and harnesses both textual relevance and temporal relevance to fulfill the temporal information requirements of a user Ur Rehman Khan et?al., 2018. The focus time of document is an important temporal aspect which is defined as the time to which the content of the document refers Jatowt et?al., 2015; Jatowt et?al., 2013; Morbidoni et?al., 2018, Khan et?al., 2018. To the best of our knowledge, there does not exist any standard benchmark data set (publicly available) that holds the potential to comprehensively evaluate the performance of focus time assessment strategies. Considering these aspects, we have produced the Event-dataset, which is comprised of 35 queries and set of news articles for each query. Such that,C={Qs,Ds},where C represents the dataset,Qsis query setQs={q1,q2,q3,…….,q35}and for eachqithere is a set of news articlesqi={dr,dnr}.dr,dnrare sets of relevant documents and non-relevant documents respectively. Each query in the dataset represents a popular event. To annotate these articles into relevant and non-relevant, we have employed a user-study based evaluation method wherein a group of postgraduate students manually annotate the articles into the aforementioned categories. We believe that the generation of such dataset can provide an opportunity for the information retrieval researchers to use it as a benchmark to evaluate focus time assessment methods specifically and information retrieval methods generically.

机译：最近，时间信息检索（TIR）抓住了信息检索社区的主要关注。 TIR在信息检索过程中利用时间动态，并利用文本相关性和时间相关性，以满足用户rehman khan et？al的时间信息要求。，2018.文件的焦点时间是定义的重要时间方面作为文档内容指的是Jatowt et？al。，2015; jatowt et？al。，2013; Morbidoni et？al。，2018，khan et？al。，2018年。据我们所知，不存在任何标准基准数据集（公开可用），该数据集（可公开）持有潜力，以全面评估焦点时间评估策略的绩效。考虑到这些方面，我们制作了Event-DataSet，该数据集由每个查询组成的35个查询和一组新闻文章。这样C = {QS，DS}，其中C代表数据集，QSIS查询SERISQS = {Q1，Q2，Q3，......，Q35}和每个预测是一组新闻艺术品QI = {DR，DNR}。 DR，DNRARE分别有相关文件和非相关文件。 DataSet中的每个查询表示流行的事件。为了将这些文章注释为相关和无关，我们使用了一项基于用户研究的评估方法，其中一组研究生手动将物品注释为上述类别。我们认为，此类数据集的生成可以为信息检索研究人员提供作为基准来评估焦点时间评估方法的基准，并专门从属于提供信息检索方法。

著录项

来源
《Data in Brief》 |2019年第2期|共8页
作者
Shafiq Ur Rehman Khan; Muhammad Arshad Islam;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词
Information retrievalTemporalText classificationFocus time assessment;

机译：信息RetrievalTemporArtage类别焦时间评估;

相似文献

外文文献
中文文献
专利

1. Temporal specificity-based text classification for information retrieval [J] . SHAFIQ UR REHMAN KHAN, MUHAMMD ARSHAD ISLAM, MUHAMMAD ALEEM, Turkish Journal of Electrical Engineering and Computer Sciences . 2018,第6期

机译：基于时间特异性的文本分类信息检索
2. Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods [J] . Kou Gang, Yang Pei, Peng Yi, Applied Soft Computing . 2020,第期

机译：使用多种标准决策方法对小型数据集的文本分类特征选择方法的评估
3. CHARCNN-SVM FOR CHINESE TEXT DATASETS SENTIMENT CLASSIFICATION WITH DATA AUGMENTATION [J] . Xingkai Wang, Yiqiang Sheng, Haojiang Deng, International Journal of Innovative Computing Information and Control . 2019,第1期

机译：具有数据增强功能的CHARCNN-SVM用于中文文本数据情感分类
4. Text Retrieval Using SMS Queries: Datasets and Overview of FIRE 2011 Track on SMS-Based FAQ Retrieval [C] . Danish Contractor, L. Venkata Subramaniam, Deepak P., Forum for Information Retrieval Evaluation . 2013

机译：使用SMS查询的文本检索：基于SMS的FAIQ Roverival 2011 Track的数据集和概述
5. Relevantnost informacijskega priklica pri strojnem u?enju za binarno besedilno klasifikacijo =Relevance of Information Retrieval in Machine Learning Binary Text Classification [D] . Marijan, Robert. 2020

机译：信息检索的相关性当机器学习的二进制文本分类时=信息检索和机器学习二进制文本分类的相关性
6. Event-Dataset: Temporal information retrieval and text classification dataset [O] . Shafiq Ur Rehman Khan, Muhammad Arshad Islam 2019

机译：事件数据集：时间信息检索和文本分类数据集
7. Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach [O] . Wenpeng Yin, Jamaal Hay, Dan Roth 2019

机译：基准测试零拍文本分类：数据集，评估和征集方法

Event-Dataset: Temporal information retrieval and text classification dataset

摘要

著录项

相似文献

相关主题

期刊订阅