Identifying entities from scientific publications: A comparison of vocabulary- and model-based methods

Yan Erjia; Zhu Yongjun

首页> 外文期刊>Journal of informetrics >Identifying entities from scientific publications: A comparison of vocabulary- and model-based methods

【24h】

Identifying entities from scientific publications: A comparison of vocabulary- and model-based methods

机译：从科学出版物中识别实体：基于词汇和基于模型的方法的比较

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The objective of this study is to evaluate the performance of five entity extraction methods for the task of identifying entities from scientific publications, including two vocabularybased methods (a keyword-based and a Wikipedia-based) and three model-based methods (conditional random fields (CRF), CRF with keyword-based dictionary, and CRF with Wikipedia-based dictionary). These methods are applied to an annotated test set of publications in computer science. Precision, recall, accuracy, area under the ROC curve, and area under the precision-recall curve are employed as the evaluative indicators. Results show that the model-based methods outperform the vocabulary-based ones, among which CRF with keyword-based dictionary has the best performance. Between the two vocabularybased methods, the keyword-based one has a higher recall and the Wikipedia-based one has a higher precision. The findings of this study help inform the understanding of informetric research at a more granular level. (C) 2015 Elsevier Ltd. All rights reserved.

机译：这项研究的目的是评估用于从科学出版物中识别实体的五种实体提取方法的性能，包括两种基于词汇的方法（基于关键字和基于维基百科）和三种基于模型的方法（条件随机字段）（CRF），带有基于关键字的字典的CRF和带有基于Wikipedia的字典的CRF）。这些方法应用于计算机科学出版物的带注释的测试集。精度，召回率，准确性，ROC曲线下的面积和精度召回曲线下的面积均用作评估指标。结果表明，基于模型的方法优于基于词汇的方法，其中以关键字为基础的字典的CRF表现最佳。在这两种基于词汇的方法之间，基于关键字的方法具有较高的查全率，而基于维基百科的方法具有较高的查全率。这项研究的结果有助于更深入地了解信息学。（C）2015 Elsevier Ltd.保留所有权利。

著录项

来源
《Journal of informetrics》 |2015年第3期|455-465|共11页
作者
Yan Erjia; Zhu Yongjun;
展开▼
作者单位

Drexel Univ, Coll Comp & Informat, Philadelphia, PA 19104 USA;

Drexel Univ, Coll Comp & Informat, Philadelphia, PA 19104 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Entity extraction; Vocabulary; Dictionary; Conditional random fields; Content aware;

机译：实体提取;词汇;词典;条件随机域;内容感知;

相似文献

外文文献
中文文献
专利

1. Creation of journal-based publication profiles of scientific institutions — A methodology for the interdisciplinary comparison of scientific research based on the J-factor [J] . Rafael Ball, Bernhard Mittermaier, Dirk Tunger Scientometrics . 2009,第2期

机译：创建科学机构基于期刊的出版物资料-基于J因子的科学研究跨学科比较的方法
2. Delineating the scientific footprint in technology: Identifying scientific publications within non-patent references [J] . Julie Callaert, Joris Grouwels, Bart Van Looy Scientometrics . 2012,第2期

机译：描绘技术的科学足迹：在非专利参考文献中识别科学出版物
3. Development of model-based publication for scientific communication [J] . Hugo Cornelis, Allan D Coop, James M Bower BMC Neuroscience . 2010,第SUPPLEMENTa1期

机译：基于模型的科学沟通的开发
4. Coner: A Collaborative Approach for Long-Tail Named Entity Recognition in Scientific Publications [C] . Daniel Vliegenthart, Sepideh Mesbah, Christoph Lofi, International conference on theory and practice of digital libraries . 2019

机译：Coner：科学出版物中长尾命名实体识别的协作方法
5. Detecting publication bias in random effects meta -analysis: An empirical comparison of statistical methods [D] . Rendina-Gobioff, Gianna 2006

机译：在随机效应荟萃分析中检测出版偏倚：统计方法的经验比较
6. Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods [O] . Lovro Šubelj, Nees Jan van Eck, Ludo Waltman -1

机译：基于引用关系对科学出版物进行聚类：不同方法的系统比较
7. Creation of journal-based publication profiles of scientific institutions — A methodology for the interdisciplinary comparison of scientific research based on the J-factor [O] . Ball Rafael, Mittermaier Bernhard, Tunger Dirk 2009

机译：创建科学机构基于期刊的出版物资料-基于J因子的科学研究跨学科比较的方法

Identifying entities from scientific publications: A comparison of vocabulary- and model-based methods

摘要

著录项

相似文献

相关主题

期刊订阅