首页> 外文会议>9th International conference on language resources and evaluation >Use of unsupervised word classes for entity recognition: Application to the detection of disorders in clinical reports
【24h】

Use of unsupervised word classes for entity recognition: Application to the detection of disorders in clinical reports

机译:使用无监督词类进行实体识别:应用于临床报告中的疾病检测

获取原文

摘要

Unsupervised word classes induced from unannotated text corpora are increasingly used to help tasks addressed by supervised classification, such as standard named entity detection. This paper studies the contribution of unsupervised word classes to a medical entity detection task with two specific objectives: How do unsupervised word classes compare to available knowledge-based semantic classes? Does syntactic information help produce unsupervised word classes with better properties? We design and test two syntax-based methods to produce word classes: one applies the Brown clustering algorithm to syntactic dependencies, the other collects latent categories created by a PCFG-LA parser. When added to non-semantic features, knowledge-based semantic classes gain 7.28 points of F-measure. In the same context, basic unsupervised word classes gain 4.16pt, reaching 60% of the contribution of knowledge-based semantic classes and outperforming Wikipedia, and adding PCFG-LA unsupervised word classes gain one more point at 5.11pt, reaching 70%. Unsupervised word classes could therefore provide a useful semantic back-off in domains where no knowledge-based semantic classes are available. The combination of both knowledge-based and basic unsupervised classes gains 8.33pt. Therefore, unsupervised classes are still useful even when rich knowledge-based classes exist.
机译:由无注释文本语料库引起的无监督词类越来越多地用于帮助通过监督分类解决的任务,例如标准命名实体检测。本文研究了无监督词类对医疗实体检测任务的贡献,其目的有两个特定的目标:无监督词类与现有的基于知识的语义类相比如何?句法信息是否有助于产生具有更好属性的无监督词类?我们设计并测试了两种基于语法的方法来生成单词类:一种将Brown聚类算法应用于句法依存关系,另一种则收集由PCFG-LA解析器创建的潜在类别。当添加到非语义特征中时,基于知识的语义类将获得7.28分的F测度。在相同的上下文中,基本的无监督词类获得了4.16pt,达到了基于知识的语义类的贡献的60%,并且胜过了Wikipedia;而添加PCFG-LA无监督词类则获得了5.11pt的另一点,达到了70%。因此,无监督词类可以在没有基于知识的语义类可用的领域中提供有用的语义退避。基于知识的基础班和基本的无监督班的组合获得了8.33分。因此,即使存在基于知识的丰富类,无监督类仍然有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号