ETM: Enrichment by topic modeling for automated clinical sentence classification to detect patients' disease history

Bagheri Ayoub; Sammani Arjan; van der Heijden Peter G. M.; Asselbergs Folkert W.; Oberski Daniel L.

首页> 外文期刊>Journal of Intelligent Information Systems >ETM: Enrichment by topic modeling for automated clinical sentence classification to detect patients' disease history

【24h】

ETM: Enrichment by topic modeling for automated clinical sentence classification to detect patients' disease history

机译：ETM：由自动临床句子分类进行主题建模的富集检测患者疾病史

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Given the rapid rate at which text data are being digitally gathered in the medical domain, there is growing need for automated tools that can analyze clinical notes and classify their sentences in electronic health records (EHRs). This study uses EHR texts to detect patients' disease history from clinical sentences. However, in EHRs, sentences are less topic-focused and shorter than that in general domain, which leads to the sparsity of co-occurrence patterns and the lack of semantic features. To tackle this challenge, current approaches for clinical sentence classification are dependent on external information to improve classification performance. However, this is implausible owing to a lack of universal medical dictionaries. This study proposes the ETM (enrichment by topic modeling) algorithm, based on latent Dirichlet allocation, to smoothen the semantic representations of short sentences. The ETM enriches text representation by incorporating probability distributions generated by an unsupervised algorithm into it. It considers the length of the original texts to enhance representation by using an internal knowledge acquisition procedure. When it comes to clinical predictive modeling, interpretability improves the acceptance of the model. Thus, for clinical sentence classification, the ETM approach employs an initial TFiDF (term frequency inverse document frequency) representation, where we use the support vector machine and neural network algorithms for the classification task. We conducted three sets of experiments on a data set consisting of clinical cardiovascular notes from the Netherlands to test the sentence classification performance of the proposed method in comparison with prevalent approaches. The results show that the proposed ETM approach outperformed state-of-the-art baselines.

机译：鉴于文本数据在医疗领域进行了数字收集的快速速率，越来越需要可以分析临床票据并在电子健康记录（EHRS）中对其句子进行分类。本研究使用EHR文本来检测临床句子的患者疾病史。然而，在EHRS中，句子较少焦点，比普通域中的更短，这导致了共同发生模式的稀疏性和缺乏语义特征。为了解决这一挑战，目前的临床句子分类方法取决于外部信息，以提高分类性能。然而，由于缺乏普遍的医学词典，这是令人难以妨碍的。本研究提出了基于潜在的Dirichlet分配的ETM（富集主题建模）算法，使短语的语义表示平滑。 ETM通过将无监督算法生成的概率分布合并到其中，丰富了文本表示。它考虑通过使用内部知识获取程序来增强表示的原始文本的长度。涉及临床预测建模时，解释性提高了模型的接受。因此，对于临床句子分类，ETM方法采用初始TFIDF（术语频率逆文档频率）表示，在那里我们使用支持向量机和神经网络算法进行分类任务。我们对由荷兰的临床心血管笔记组成的数据集进行了三组实验，以测试所提出的方法的句子分类性能与普遍的方法相比。结果表明，所提出的ETM方法优于最先进的基线。

著录项

来源
《Journal of Intelligent Information Systems》 |2020年第2期|329-349|共21页
作者
Bagheri Ayoub; Sammani Arjan; van der Heijden Peter G. M.; Asselbergs Folkert W.; Oberski Daniel L.;
展开▼
作者单位

Univ Utrecht Dept Methodol & Stat Utrecht Netherlands|Univ Med Ctr Utrecht Div Heart & Lungs Dept Cardiol Utrecht Netherlands;

Univ Med Ctr Utrecht Div Heart & Lungs Dept Cardiol Utrecht Netherlands;

Univ Utrecht Dept Methodol & Stat Utrecht Netherlands|Univ Southampton S3RI Fac Social Sci Southampton Hants England;

Univ Med Ctr Utrecht Div Heart & Lungs Dept Cardiol Utrecht Netherlands|UCL Inst Cardiovasc Sci Fac Populat Hlth Sci London England|UCL Hlth Data Res UK Inst Hlth Informat London England;

Univ Utrecht Dept Methodol & Stat Utrecht Netherlands|Univ Med Ctr Utrecht Julius Ctr Hlth Sci & Primary Care Utrecht Netherlands;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Sentence classification; Clinical sentence classification; Short text classification; Latent Dirichlet allocation; Enriched text representation;

机译：句子分类;临床句子分类;短文本分类;潜在的Dirichlet分配;丰富的文本表示;

相似文献

外文文献
中文文献
专利

1. Topic-Aware Deep Compositional Models for Sentence Classification [J] . Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2017,第2期

机译：用于句子分类的主题感知深度合成模型
2. Automated ancillary cancer history classification for mesothelioma patients from free-text clinical reports [J] . Richard A. Wilson, Wendy W. Chapman, Shawn J. DeFries, Journal of Pathology Informatics . 2010,第1期

机译：间皮瘤患者的自动辅助癌病史分类
3. Automated classification of patents: A topic modeling approach [J] . Junghwan Yun, Youngjung Geum Computers & Industrial Engineering . 2020,第Sepa期

机译：自动分类专利：主题建模方法
4. Detecting the Mobility of Patient with Chronic Diseases in Online Health Communities using Ant Colony Optimization Algorithm Ensure Patient’s Safety and Diseases Awareness based on Reliable Medical Education Material [C] . Victor Zogbochi, Thierry EDOH, Joel T. Hounsou, International Conference on Smart Cities and Communities . 2018

机译：使用蚁群优化算法检测在线健康社区中慢性病患者的活动能力，以可靠的医学教育资料为基础，确保患者的安全性和疾病意识
5. Effective Classification of Clinical Reports: Natural Language Processing-Based and Topic Modeling-Based Approaches. [D] . Sarioglu, Efsun Selin. 2014

机译：临床报告的有效分类：基于自然语言处理和基于主题建模的方法。
6. Automated classification of focal breast lesions according to S-detect: validation and role as a clinical and teaching tool [O] . Mattia Di Segni, Valeria de Soccio, Vito Cantisani, 2018

机译：根据S-detect自动分类乳腺局灶性病变：验证并作为临床和教学工具
7. ETM: Enrichment by topic modeling for automated clinical sentence classification to detect patients’ disease history [O] . Ayoub Bagheri, Arjan Sammani, Peter G. M. van der Heijden, 2020

机译：ETM：由自动临床句子分类进行主题建模的富集检测患者疾病史
8. Conversion of ICD-9 (International Classification of Diseases, Ninth Revision) and ICPM (International Classification of Procedures in Medicine) Data to ICD-9-CM (Clinical Modification) with Adaptation to DRGs. Appendix F. ICD-9 a [R] . Baker, S. W., Austin, V. R., Clay, J. A. 1987

机译：将ICD-9（国际疾病分类，第九版）和ICpm（国际医学程序分类）数据转换为适应DRG的ICD-9-Cm（临床修改）。附录F. ICD-9 a

ETM: Enrichment by topic modeling for automated clinical sentence classification to detect patients' disease history

摘要

著录项

相似文献

相关主题

期刊订阅