首页> 外文期刊>Journal of the American Medical Informatics Association : >Feature engineering combined with machine learning and rule-based methods for structured information extraction from narrative clinical discharge summaries.
【24h】

Feature engineering combined with machine learning and rule-based methods for structured information extraction from narrative clinical discharge summaries.

机译:特征工程与机器学习和基于规则的方法相结合,可从叙述性临床出院摘要中提取结构化信息。

获取原文
获取原文并翻译 | 示例
       

摘要

A system that translates narrative text in the medical domain into structured representation is in great demand. The system performs three sub-tasks: concept extraction, assertion classification, and relation identification.The overall system consists of five steps: (1) pre-processing sentences, (2) marking noun phrases (NPs) and adjective phrases (APs), (3) extracting concepts that use a dosage-unit dictionary to dynamically switch two models based on Conditional Random Fields (CRF), (4) classifying assertions based on voting of five classifiers, and (5) identifying relations using normalized sentences with a set of effective discriminating features.Macro-averaged and micro-averaged precision, recall and F-measure were used to evaluate results.The performance is competitive with the state-of-the-art systems with micro-averaged F-measure of 0.8489 for concept extraction, 0.9392 for assertion classification and 0.7326 for relation identification.The system exploits an array of common features and achieves state-of-the-art performance. Prudent feature engineering sets the foundation of our systems. In concept extraction, we demonstrated that switching models, one of which is especially designed for telegraphic sentences, improved extraction of the treatment concept significantly. In assertion classification, a set of features derived from a rule-based classifier were proven to be effective for the classes such as conditional and possible. These classes would suffer from data scarcity in conventional machine-learning methods. In relation identification, we use two-staged architecture, the second of which applies pairwise classifiers to possible candidate classes. This architecture significantly improves performance.
机译:迫切需要一种将医学领域中的叙述文本转换为结构化表示的系统。该系统执行三个子任务:概念提取,断言分类和关系识别。整个系统包括五个步骤:(1)预处理句子;(2)标记名词短语(NP)和形容词短语(AP); (3)提取使用剂量单位字典基于条件随机字段(CRF)动态切换两个模型的概念,(4)基于五个分类器的投票对断言进行分类,以及(5)使用带有一组归一化语句的关系来识别关系有效的识别特征。使用宏观平均和微观平均精度,召回率和F-measure来评估结果。性能与最新系统相比,其概念的微观平均F-measure为0.8489提取,断言分类为0.9392,关系识别为0.7326。该系统利用了一系列共同特征,并实现了最先进的性能。谨慎的特征工程为我们的系统奠定了基础。在概念提取中,我们证明了转换模型(其中一种是专门为电文句子设计的)显着改善了治疗概念的提取。在断言分类中,从基于规则的分类器派生的一组功能被证明对诸如条件和可能的类有效。在传统的机器学习方法中,这些类将遭受数据稀缺的困扰。在关系识别中,我们使用两阶段体系结构,第二种体系结构将成对分类器应用于可能的候选类。该体系结构显着提高了性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号