首页> 美国卫生研究院文献>AMIA Annual Symposium Proceedings >A Study of Neural Word Embeddings for Named Entity Recognition in Clinical Text
【2h】

A Study of Neural Word Embeddings for Named Entity Recognition in Clinical Text

机译:神经文本嵌入对临床文本中命名实体识别的研究

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Clinical Named Entity Recognition (NER) is a critical task for extracting important patient information from clinical text to support clinical and translational research. This study explored the neural word embeddings derived from a large unlabeled clinical corpus for clinical NER. We systematically compared two neural word embedding algorithms and three different strategies for deriving distributed word representations. Two neural word embeddings were derived from the unlabeled Multiparameter Intelligent Monitoring in Intensive Care (MIMIC) II corpus (403,871 notes). The results from both 2010 i2b2 and 2014 Semantic Evaluation (SemEval) data showed that the binarized word embedding features outperformed other strategies for deriving distributed word representations. The binarized embedding features improved the F1-score of the Conditional Random Fields based clinical NER system by 2.3% on i2b2 data and 2.4% on SemEval data. The combined feature from the binarized embeddings and the Brown clusters improved the F1-score of the clinical NER system by 2.9% on i2b2 data and 2.7% on SemEval data. Our study also showed that the distributed word embedding features derived from a large unlabeled corpus can be better than the widely used Brown clusters. Further analysis found that the neural word embeddings captured a wide range of semantic relations, which could be discretized into distributed word representations to benefit the clinical NER system. The low-cost distributed feature representation can be adapted to any other clinical natural language processing research.
机译:临床命名实体识别(NER)是从临床文本中提取重要患者信息以支持临床和翻译研究的一项关键任务。这项研究探索了源自临床NER的大型未标记临床语料库的神经词嵌入。我们系统地比较了两种神经词嵌入算法和三种不同的策略来得出分布式词表示。从未标记的重症监护多参数智能监控(MIMIC)II语料库(403,871注)中获得了两个神经词嵌入。从2010 i2b2和2014语义评估(SemEval)数据得出的结果表明,二值化词嵌入功能优于其他派生分布式词表示形式的策略。二值化嵌入功能使基于条件随机场的临床NER系统的F1评分在i2b2数据上提高了2.3%,在SemEval数据上提高了2.4%。来自二值化嵌入和Brown簇的组合特征使临床NER系统的F1评分在i2b2数据上提高了2.9%,在SemEval数据上提高了2.7%。我们的研究还表明,源自大型未标记语料库的分布式单词嵌入特征可能比广泛使用的布朗簇更好。进一步的分析发现,神经词嵌入捕获了广泛的语义关系,可以将其离散化为分布式词表示形式,从而有利于临床NER系统。低成本的分布式特征表示可以适应任何其他临床自然语言处理研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号