首页> 外文会议>IAPR workshop on document analysis systems >Investigator Name Recognition from Medical Journal Articles: A Comparative Study of SVM and Structural SVM
【24h】

Investigator Name Recognition from Medical Journal Articles: A Comparative Study of SVM and Structural SVM

机译:来自医学期刊文章的调查员名称识别:SVM和结构SVM的比较研究

获取原文

摘要

Automated extraction of bibliographic information from journal articles is key to the affordable creation and maintenance of citation databases, such as MEDLINE?. A newly required bibliographic field in this database is "Investigator Names": names of people who have contributed to the research addressed in the article, but who are not listed as authors. Since the number of such names is often large, several score or more, their manual entry is prohibitive. The automated extraction of these names is a problem in Named Entity Recognition (NER), but differs from typical NER due to the absence of normal English grammar in the text containing the names. In addition, since MEDLINE conventions require names to be expressed in a particular format, it is necessary to identify both first and last names of each investigator, an additional challenge. We seek to automate this task through two machine learning approaches: Support Vector Machine and structural SVM, both of which show good performance at the word and chunk levels. In contrast to traditional SVM, structural SVM attempts to learn a sequence by using contextual label features in addition to observational features. It outperforms SVM at the initial learning stage without using contextual observation features. However, with the addition of these contextual features from neighboring tokens, SVM performance improves to match or slightly exceed that of the structural SVM.
机译:从期刊文章书目信息自动提取的关键是经济实惠的创作和引文数据库,如MEDLINE的维护?在此数据库的新要求书目字段是“调查员姓名”:谁已经在研究做出贡献的人的名字的文章中解决,但谁不列为作者。由于这种名称的数量往往很大,几个分数以上,他们的手工录入是望而却步。这些名称的自动提取是命名实体识别(NER)的一个问题,但与典型的NER由于在包含名称文本不存在正常英语语法的不同。此外,由于MEDLINE约定需要以特定的格式来表示名称,有必要识别每个研究者,一个额外的挑战的姓和名。我们力求通过两个机器学习的方法来自动执行此任务:支持向量机和结构SVM,二者的表现在字和块级别不错的表现。相较于传统的SVM,SVM结构试图通过除观测功能使用情境标签功能学的序列。它优于SVM在最初的学习阶段,而无需使用情境观察功能。然而,增加的这些上下文特征从邻近的令牌,SVM性能提高,以匹配或略微超过了结构SVM的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号