首页> 外文会议>Pacific-Asia conference on knowledge discovery and data mining >Category Multi-representation: A Unified Solution for Named Entity Recognition in Clinical Texts
【24h】

Category Multi-representation: A Unified Solution for Named Entity Recognition in Clinical Texts

机译:类别多表示:临床文本中命名实体识别的统一解决方案

获取原文

摘要

Clinical Named Entity Recognition (CNER), the task of identifying the entity boundaries in clinical texts, is essential for many applications. Previous methods usually follow the traditional NER methods that heavily rely on language specific features (i.e. linguistics and lexicons) and high quality annotated data. However, due to the problem of Limited Availability of Annotated Data and Informal Clinical Texts, CNER becomes more challenging. In this paper, we propose a novel method that learn multiple representations for each category, namely category-multi-representation (CMR) that captures the semantic relat-edness between words and clinical categories from different perspectives. CMR is learned based on a large scale unannotated corpus and a small set of annotated data, which greatly alleviates the burden of human effort. Instead of the language specific features, our proposed method uses more evidential features without any additional NLP tools, and enjoys a lightweight adaption among languages. We conduct a series of experiments to verify our new CMR features can further improve the performance of NER significantly without leveraging any external lexicons.
机译:临床命名实体识别(CNER)是在临床文本中标识实体边界的任务,对于许多应用程序来说都是必不可少的。先前的方法通常遵循传统的NER方法,该方法严重依赖于语言的特定功能(即语言学和词典)以及高质量的带注释数据。但是,由于注释数据和非正式临床文本的可用性有限的问题,CNER变得更具挑战性。在本文中,我们提出了一种学习每种类别的多种表示的新颖方法,即类别多表示(CMR),它从不同的角度捕获了单词和临床类别之间的语义相关性。 CMR是基于大规模的未注释语料库和少量注释数据集而学习的,这极大地减轻了人员的负担。代替语言特定的功能,我们提出的方法使用了更多的证据功能,而没有任何其他的NLP工具,并且在语言之间具有轻巧的适应性。我们进行了一系列实验,以验证我们的新CMR功能可以在不利用任何外部词典的情况下进一步显着提高NER的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号