首页> 外文期刊>Journal of informetrics >Identifying entities from scientific publications: A comparison of vocabulary- and model-based methods
【24h】

Identifying entities from scientific publications: A comparison of vocabulary- and model-based methods

机译:从科学出版物中识别实体:基于词汇和基于模型的方法的比较

获取原文
获取原文并翻译 | 示例
       

摘要

The objective of this study is to evaluate the performance of five entity extraction methods for the task of identifying entities from scientific publications, including two vocabularybased methods (a keyword-based and a Wikipedia-based) and three model-based methods (conditional random fields (CRF), CRF with keyword-based dictionary, and CRF with Wikipedia-based dictionary). These methods are applied to an annotated test set of publications in computer science. Precision, recall, accuracy, area under the ROC curve, and area under the precision-recall curve are employed as the evaluative indicators. Results show that the model-based methods outperform the vocabulary-based ones, among which CRF with keyword-based dictionary has the best performance. Between the two vocabularybased methods, the keyword-based one has a higher recall and the Wikipedia-based one has a higher precision. The findings of this study help inform the understanding of informetric research at a more granular level. (C) 2015 Elsevier Ltd. All rights reserved.
机译:这项研究的目的是评估用于从科学出版物中识别实体的五种实体提取方法的性能,包括两种基于词汇的方法(基于关键字和基于维基百科)和三种基于模型的方法(条件随机字段) (CRF),带有基于关键字的字典的CRF和带有基于Wikipedia的字典的CRF)。这些方法应用于计算机科学出版物的带注释的测试集。精度,召回率,准确性,ROC曲线下的面积和精度召回曲线下的面积均用作评估指标。结果表明,基于模型的方法优于基于词汇的方法,其中以关键字为基础的字典的CRF表现最佳。在这两种基于词汇的方法之间,基于关键字的方法具有较高的查全率,而基于维基百科的方法具有较高的查全率。这项研究的结果有助于更深入地了解信息学。 (C)2015 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号