首页> 外文期刊>Information Processing & Management >Information extraction from research papers using conditional random fields
【24h】

Information extraction from research papers using conditional random fields

机译:使用条件随机场从研究论文中提取信息

获取原文
获取原文并翻译 | 示例
       

摘要

With the increasing use of research paper search engines, such as CiteSeer, for both literature search and hiring decisions, the accuracy of such systems is of paramount importance. This article employs conditional random fields (CRFs) for the task of extracting various common fields from the headers and citation of research papers. CRFs provide a principled way for incorporating various local features, external lexicon features and globle layout features. The basic theory of CRFs is becoming well-understood, but best-practices for applying them to real-world data requires additional exploration. We make an empirical exploration of several factors, including variations on Gaussian, Laplace and hyperbolic-L-1 priors for improved regularization, and several classes of features. Based on CRFs, we further present a novel approach for constraint co-reference information extraction; i.e., improving extraction performance given that we know some citations refer to the same publication. On a standard benchmark dataset, we achieve new state-of-the-art performance, reducing error in average F1 by 36%, and word error rate by 78% in comparison with the previous best SVM results. Accuracy compares even more favorably against HMMs. On four co-reference IE datasets, our system significantly improves extraction performance, with an error rate reduction of 6-14%. (c) 2005 Elsevier Ltd. All rights reserved.
机译:随着越来越多的研究论文搜索引擎(例如CiteSeer)用于文献搜索和聘用决策,此类系统的准确性至关重要。本文采用条件随机场(CRF)来从标题和研究论文引文中提取各种公共场的任务。 CRF为合并各种局部特征,外部词典特征和全局布局特征提供了一种原则方法。 CRF的基本理论正在被人们很好地理解,但是将其应用于实际数据的最佳实践需要进一步的探索。我们对几个因素进行了实证研究,包括高斯,拉普拉斯和双曲线L-1先验的变化以改进正则化,以及几类特征。基于CRF,我们进一步提出了一种新的约束共参考信息提取方法。也就是说,鉴于我们知道某些引用引用了同一出版物,因此提高了提取性能。在标准基准数据集上,我们获得了最新的性能,与之前的最佳SVM结果相比,平均F1的错误减少了36%,字错误率减少了78%。精度与HMM相比更具优势。在四个共同引用的IE数据集上,我们的系统显着提高了提取性能,错误率降低了6-14%。 (c)2005 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号