...
首页> 外文期刊>Journal of Intelligent Information Systems >ETM: Enrichment by topic modeling for automated clinical sentence classification to detect patients' disease history
【24h】

ETM: Enrichment by topic modeling for automated clinical sentence classification to detect patients' disease history

机译:ETM:由自动临床句子分类进行主题建模的富集检测患者疾病史

获取原文
获取原文并翻译 | 示例
           

摘要

Given the rapid rate at which text data are being digitally gathered in the medical domain, there is growing need for automated tools that can analyze clinical notes and classify their sentences in electronic health records (EHRs). This study uses EHR texts to detect patients' disease history from clinical sentences. However, in EHRs, sentences are less topic-focused and shorter than that in general domain, which leads to the sparsity of co-occurrence patterns and the lack of semantic features. To tackle this challenge, current approaches for clinical sentence classification are dependent on external information to improve classification performance. However, this is implausible owing to a lack of universal medical dictionaries. This study proposes the ETM (enrichment by topic modeling) algorithm, based on latent Dirichlet allocation, to smoothen the semantic representations of short sentences. The ETM enriches text representation by incorporating probability distributions generated by an unsupervised algorithm into it. It considers the length of the original texts to enhance representation by using an internal knowledge acquisition procedure. When it comes to clinical predictive modeling, interpretability improves the acceptance of the model. Thus, for clinical sentence classification, the ETM approach employs an initial TFiDF (term frequency inverse document frequency) representation, where we use the support vector machine and neural network algorithms for the classification task. We conducted three sets of experiments on a data set consisting of clinical cardiovascular notes from the Netherlands to test the sentence classification performance of the proposed method in comparison with prevalent approaches. The results show that the proposed ETM approach outperformed state-of-the-art baselines.
机译:鉴于文本数据在医疗领域进行了数字收集的快速速率,越来越需要可以分析临床票据并在电子健康记录(EHRS)中对其句子进行分类。本研究使用EHR文本来检测临床句子的患者疾病史。然而,在EHRS中,句子较少焦点,比普通域中的更短,这导致了共同发生模式的稀疏性和缺乏语义特征。为了解决这一挑战,目前的临床句子分类方法取决于外部信息,以提高分类性能。然而,由于缺乏普遍的医学词典,这是令人难以妨碍的。本研究提出了基于潜在的Dirichlet分配的ETM(富集主题建模)算法,使短语的语义表示平滑。 ETM通过将无监督算法生成的概率分布合并到其中,丰富了文本表示。它考虑通过使用内部知识获取程序来增强表示的原始文本的长度。涉及临床预测建模时,解释性提高了模型的接受。因此,对于临床句子分类,ETM方法采用初始TFIDF(术语频率逆文档频率)表示,在那里我们使用支持向量机和神经网络算法进行分类任务。我们对由荷兰的临床心血管笔记组成的数据集进行了三组实验,以测试所提出的方法的句子分类性能与普遍的方法相比。结果表明,所提出的ETM方法优于最先进的基线。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号