首页> 外文会议>9th IAPR workshop on document analysis systems 2010 >Investigator Name Recognition from Medical Journal Articles: A Comparative Study of SVM and Structural SVM
【24h】

Investigator Name Recognition from Medical Journal Articles: A Comparative Study of SVM and Structural SVM

机译:医学期刊文章中研究者姓名的识别:SVM和结构SVM的比较研究

获取原文
获取原文并翻译 | 示例

摘要

Automated extraction of bibliographic information from journal articles is key to the affordable creation and maintenance of citation databases, such as MEDLINE®. A newly required bibliographic field in this database is "Investigator Names": names of people who have contributed to the research addressed in the article, but who are not listed as authors. Since the number of such names is often large, several score or more, their manual entry is prohibitive. The automated extraction of these names is a problem in Named Entity Recognition (NER), but differs from typical NER due to the absence of normal English grammar in the text containing the names. In addition, since MEDLINE conventions require names to be expressed in a particular format, it is necessary to identify both first and last names of each investigator, an additional challenge. We seek to automate this task through two machine learning approaches: Support Vector Machine and structural SVM, both of which show good performance at the word and chunk levels. In contrast to traditional SVM, structural SVM attempts to learn a sequence by using contextual label features in addition to observational features. It outperforms SVM at the initial learning stage without using contextual observation features. However, with the addition of these contextual features from neighboring tokens, SVM performance improves to match or slightly exceed that of the structural SVM.
机译:从期刊文章中自动提取书目信息对于以可负担的方式创建和维护MEDLINE®等引文数据库至关重要。此数据库中新要求的书目字段是“调查者姓名”:为本文中涉及的研究做出过贡献但未列出为作者的人员的姓名。由于此类名称的数量通常很大,分数很高,甚至更高,因此手动输入它们是禁止的。这些名称的自动提取在命名实体识别(NER)中是一个问题,但是由于包含名称的文本中缺少常规的英语语法,因此与典型的NER有所不同。另外,由于MEDLINE约定要求名称必须以特定格式表示,因此有必要识别每个调查人员的名字和姓氏,这是另一个挑战。我们寻求通过两种机器学习方法来自动化该任务:支持向量机和结构化SVM,这两种方法在单词和块级都表现出良好的性能。与传统的SVM相比,结构化SVM尝试通过使用观察标记以及观察特征来学习序列。在不使用上下文观察功能的情况下,它在初始学习阶段的性能优于SVM。但是,通过从相邻令牌中添加这些上下文功能,SVM性能将提高到与结构SVM匹配或略有提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号