...
首页> 外文期刊>Journal of biomedical informatics. >Unsupervised biomedical named entity recognition: Experiments with clinical and biological texts
【24h】

Unsupervised biomedical named entity recognition: Experiments with clinical and biological texts

机译:无人监督的生物医学命名实体识别:临床和生物学文本实验

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Named entity recognition is a crucial component of biomedical natural language processing, enabling information extraction and ultimately reasoning over and knowledge discovery from text. Much progress has been made in the design of rule-based and supervised tools, but they are often genre and task dependent. As such, adapting them to different genres of text or identifying new types of entities requires major effort in re-annotation or rule development. In this paper, we propose an unsupervised approach to extracting named entities from biomedical text. We describe a stepwise solution to tackle the challenges of entity boundary detection and entity type classification without relying on any handcrafted rules, heuristics, or annotated data. A noun phrase chunker followed by a filter based on inverse document frequency extracts candidate entities from free text. Classification of candidate entities into categories of interest is carried out by leveraging principles from distributional semantics. Experiments show that our system, especially the entity classification step, yields competitive results on two popular biomedical datasets of clinical notes and biological literature, and outperforms a baseline dictionary match approach. Detailed error analysis provides a road map for future work.
机译:命名实体识别是生物医学自然语言处理的关键组成部分,它能够提取信息,并最终推理和从文本中发现知识。在基于规则和受监督的工具的设计上已经取得了很大的进步,但是它们通常取决于类型和任务。因此,要使它们适应不同类型的文本或识别新类型的实体,需要在重新注释或规则制定方面付出大量努力。在本文中,我们提出了一种从生物医学文本中提取命名实体的无监督方法。我们描述了一种逐步解决方案,可解决实体边界检测和实体类型分类的挑战,而无需依赖任何手工制定的规则,试探法或带注释的数据。名词短语分块器,后跟基于逆文档频率的过滤器,可从自由文本中提取候选实体。通过利用分布语义中的原理将候选实体分类为感兴趣的类别。实验表明,我们的系统(尤其是实体分类步骤)在两个流行的临床笔记和生物学文献生物医学数据集上均具有竞争性结果,并且优于基准字典匹配方法。详细的错误分析为将来的工作提供了路线图。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号