首页> 外文期刊>Journal of the American Medical Informatics Association : >Automatic extraction of relations between medical concepts in clinical texts.
【24h】

Automatic extraction of relations between medical concepts in clinical texts.

机译:自动提取临床文本中医学概念之间的关系。

获取原文
获取原文并翻译 | 示例
           

摘要

OBJECTIVE: A supervised machine learning approach to discover relations between medical problems, treatments, and tests mentioned in electronic medical records. MATERIALS AND METHODS: A single support vector machine classifier was used to identify relations between concepts and to assign their semantic type. Several resources such as Wikipedia, WordNet, General Inquirer, and a relation similarity metric inform the classifier. RESULTS: The techniques reported in this paper were evaluated in the 2010 i2b2 Challenge and obtained the highest F1 score for the relation extraction task. When gold standard data for concepts and assertions were available, F1 was 73.7, precision was 72.0, and recall was 75.3. F1 is defined as 2*Precision*Recall/(Precision+Recall). Alternatively, when concepts and assertions were discovered automatically, F1 was 48.4, precision was 57.6, and recall was 41.7. DISCUSSION: Although a rich set of features was developed for the classifiers presented in this paper, little knowledge mining was performed from medical ontologies such as those found in UMLS. Future studies should incorporate features extracted from such knowledge sources, which we expect to further improve the results. Moreover, each relation discovery was treated independently. Joint classification of relations may further improve the quality of results. Also, joint learning of the discovery of concepts, assertions, and relations may also improve the results of automatic relation extraction. CONCLUSION: Lexical and contextual features proved to be very important in relation extraction from medical texts. When they are not available to the classifier, the F1 score decreases by 3.7%. In addition, features based on similarity contribute to a decrease of 1.1% when they are not available.
机译:目的:一种有监督的机器学习方法,以发现电子病历中提到的医学问题,治疗和测试之间的关系。材料与方法:使用单个支持向量机分类器来识别概念之间的关系并分配其语义类型。诸如Wikipedia,WordNet,General Inquirer以及关系相似性度量之类的一些资源会通知分类器。结果:在2010年i2b2挑战赛中对本文报道的技术进行了评估,并为关系提取任务获得了最高的F1分数。当获得有关概念和断言的金标准数据时,F1为73.7,精度为72.0,召回率为75.3。 F1被定义为2 * Precision * Recall /(Precision + Recall)。另外,当自动发现概念和断言时,F1为48.4,精度为57.6,召回率为41.7。讨论:尽管为本文介绍的分类器开发了丰富的功能集,但是从医学本体(如UMLS中发现的本体)进行的知识挖掘很少。未来的研究应包含从此类知识源中提取的功能,我们希望这些功能可以进一步改善结果。而且,每个关系发现都被独立对待。关系的联合分类可以进一步提高结果的质量。同样,对概念,断言和关系发现的共同学习也可以改善自动关系提取的结果。结论:词汇和上下文特征在从医学文本中提取关系中被证明非常重要。当分类器无法使用它们时,F1分数降低3.7%。此外,基于相似性的功能在不可用时会导致减少1.1%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号