Identification of Tasks, Datasets, Evaluation Metrics, and Numeric Scores for Scientific Leaderboards Construction

机译：识别科学排行榜结构的任务，数据集，评估指标和数字分数

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

While the fast-paced inception of novel tasks and new datasets helps foster active research in a community towards interesting directions, keeping track of the abundance of research activity in different areas on different datasets is likely to become increasingly difficult. The community could greatly benefit from an automatic system able to summarize scientific results, e.g., in the form of a leaderboard. In this paper we build two datasets and develop a framework (TDMS-IE) aimed at automatically extracting task, dataset, metric and score from NLP papers, towards the automatic construction of leaderboards. Experiments show that our model outperforms several baselines by a large margin. Our model is a first step towards automatic leaderboard construction, e.g., in the NLP domain.

机译：虽然快节奏的新型任务和新数据集的成立有助于促进在社区实现有趣方向的积极研究，但是跟踪不同数据集不同区域的丰富研究活动可能会越来越困难。社区可以从能够总结科学效果的自动系统大大受益，例如，以排行榜的形式总结。在本文中，我们构建了两个数据集，并开发了一个框架（TDMS-IE），旨在自动提取从NLP论文的任务，数据集，度量标准和分数，朝向自动构建排行榜。实验表明，我们的模型优于几个基线的大幅度。我们的模型是迈向NLP域中的自动排行板结构的第一步。

著录项

来源
《Annual meeting of the Association for Computational Linguistics》|2019年|cxxxiv p. 4609-5267|共11页
会议地点
作者
Yufang Hou; Charles Jochim; Martin Gleize; Francesca Bonin; Debasis Ganguly;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. A Systematic Evaluation and Benchmark for Person Re-Identification: Features, Metrics, and Datasets [J] . Karanam Srikrishna, Gou Mengran, Wu Ziyan, IEEE Transactions on Pattern Analysis and Machine Intelligence . 2019,第3期

机译：人员重新识别的系统评估和基准：功能，指标和数据集
2. Scoring of senescence signalling in multiple human tumour gene expression datasets, identification of a correlation between senescence score and drug toxicity in the NCI60 panel and a pro-inflammatory signature correlating with survival advantage in peritoneal mesothelioma [J] . Kyle Lafferty-Whyte, Alan Bilsland, Claire J Cairney, BMC Genomics . 2010,第1期

机译：在多个人类肿瘤基因表达数据集中对衰老信号进行评分，在NCI60面板中鉴定衰老评分与药物毒性之间的相关性以及与腹膜间皮瘤生存优势相关的促炎信号
3. Evaluation of copy-move forgery detection: datasets and evaluation metrics [J] . Al-Qershi Osamah M., Khoo Bee Ee Multimedia Tools and Applications . 2018,第24期

机译：评估复制移动伪造检测：数据集和评估指标
4. Identification of Tasks, Datasets, Evaluation Metrics, and Numeric Scores for Scientific Leaderboards Construction [C] . Yufang Hou, Charles Jochim, Martin Gleize, Annual meeting of the Association for Computational Linguistics . 2019

机译：确定科学排行榜构建的任务，数据集，评估指标和数字分数
5. Image Captioning: A Survey of Existing Issues on Datasets, Evaluation Metrics and Methods [D] . zhou, liwan . 2020

机译：图像字幕：对数据集的现有问题，评估度量和方法的调查
6. Scoring of senescence signalling in multiple human tumour gene expression datasets identification of a correlation between senescence score and drug toxicity in the NCI60 panel and a pro-inflammatory signature correlating with survival advantage in peritoneal mesothelioma [O] . Kyle Lafferty-Whyte, Alan Bilsland, Claire J Cairney, 2010

机译：在多个人类肿瘤基因表达数据集中对衰老信号进行评分在NCI60面板中鉴定衰老评分与药物毒性之间的相关性以及与腹膜间皮瘤生存优势相关的促炎信号
7. A Systematic Evaluation and Benchmark for Person Re-Identification: Features, Metrics, and Datasets [O] . Karanam, Srikrishna, Gou, Mengran, Wu, Ziyan, 2017

机译：人员重新识别的系统评估和基准：功能，指标和数据集

Identification of Tasks, Datasets, Evaluation Metrics, and Numeric Scores for Scientific Leaderboards Construction

摘要

著录项

相似文献

相关主题

期刊订阅