首页> 外文会议>Workshop on computational semantics in clinical text >Evaluating the Use of Empirically Constructed Lexical Resources for Named Entity Recognition
【24h】

Evaluating the Use of Empirically Constructed Lexical Resources for Named Entity Recognition

机译:评估使用经验构造的词汇资源进行命名实体识别的使用

获取原文
获取原文并翻译 | 示例

摘要

Because of privacy concerns and the expense involved in creating an annotated corpus, the existing small annotated corpora might not have sufficient number of examples for statistically learning to extract all the named-entities precisely. In this work, we evaluate what value may lie in automatically generated features based on distributional semantics when using machine-learning named entity recognition (NER). The features we generated and experimented with include n-nearest words, support vector machine (SVM)-regions, and term clustering, all of which are considered semantic (or distributional semantic) features. The addition of n-nearest words feature resulted in a greater increase in F-score than adding a manually constructed lexicon to a baseline system that extracts medical concepts from clinical notes. Although the need for relatively small annotated corpora for retraining is not obviated, lexicons empirically derived from unannotated text can not only supplement manually created lexicons, but replace them. This phenomenon is observed in extracting concepts both from biomedical literature and clinical notes.
机译:由于出于隐私方面的考虑以及创建带注释的语料库所涉及的费用,现有的带注释的小语料库可能没有足够的示例数,无法通过统计学习精确地提取所有命名实体。在这项工作中,我们使用机器学习命名实体识别(NER)时,会基于分布语义评估自动生成的要素中可能具有的价值。我们生成和试验的特征包括n个最近词,支持向量机(SVM)区域和术语聚类,所有这些特征均被视为语义(或分布语义)特征。与将人工构建的词典添加到从临床笔记中提取医学概念的基线系统相比,添加n最近词功能可导致F分数的增加更大。尽管没有消除对较小的带注释的语料库进行再培训的需求,但凭经验从无注释的文本派生的词典不仅可以补充手动创建的词典,还可以替换它们。从生物医学文献和临床笔记中提取概念时都观察到这种现象。

著录项

  • 来源
  • 会议地点 Potsdam(DE)
  • 作者单位

    Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA;

    School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, USA;

    Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA;

    Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA;

    Department of Biomedical Informatics, Arizona State University, Phoenix, AZ, USA;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号