首页> 外文期刊>BMC Medical Informatics and Decision Making >Explanation and prediction of clinical data with imbalanced class distribution based on pattern discovery and disentanglement
【24h】

Explanation and prediction of clinical data with imbalanced class distribution based on pattern discovery and disentanglement

机译:基于模式发现与解剖学的临床数据对临床数据的解释与预测

获取原文
       

摘要

Statistical data analysis, especially the advanced machine learning (ML) methods, have attracted considerable interest in clinical practices. We are looking for interpretability of the diagnostic/prognostic results that will bring confidence to doctors, patients and their relatives in therapeutics and clinical practice. When datasets are imbalanced in diagnostic categories, we notice that the ordinary ML methods might produce results overwhelmed by the majority classes diminishing prediction accuracy. Hence, it needs methods that could produce explicit transparent and interpretable results in decision-making, without sacrificing accuracy, even for data with imbalanced groups. In order to interpret the clinical patterns and conduct diagnostic prediction of patients with high accuracy, we develop a novel method, Pattern Discovery and Disentanglement for Clinical Data Analysis (cPDD), which is able to discover patterns (correlated traits/indicants) and use them to classify clinical data even if the class distribution is imbalanced. In the most general setting, a relational dataset is a large table such that each column represents an attribute (trait/indicant), and each row contains a set of attribute values (AVs) of an entity (patient). Compared to the existing pattern discovery approaches, cPDD can discover a small succinct set of statistically significant high-order patterns from clinical data for interpreting and predicting the disease class of the patients even with groups small and rare. Experiments on synthetic and thoracic clinical dataset showed that cPDD can 1) discover a smaller set of succinct significant patterns compared to other existing pattern discovery methods; 2) allow the users to interpret succinct sets of patterns coming from uncorrelated sources, even the groups are rare/small; and 3) obtain better performance in prediction compared to other interpretable classification approaches. In conclusion, cPDD discovers fewer patterns with greater comprehensive coverage to improve the interpretability of patterns discovered. Experimental results on synthetic data validated that cPDD discovers all patterns implanted in the data, displays them precisely and succinctly with statistical support for interpretation and prediction, a capability which the traditional ML methods lack. The success of cPDD as a novel interpretable method in solving the imbalanced class problem shows its great potential to clinical data analysis for years to come.
机译:统计数据分析,特别是先进的机器学习(ML)方法,对临床实践引起了相当大的兴趣。我们正在寻找诊断/预后结果的可解释性,这些结果将为医生,患者及其亲属带来信心,治疗和临床实践。当数据集在诊断类别中不平衡时,我们注意到普通的ML方法可能产生由于递减预测准确性的多数类的结果而不堪重负。因此,它需要在没有牺牲准确性的情况下产生明确的透明和可解释结果的方法,即使对于具有不平衡组的数据,也不会牺牲准确性。为了解释临床模式并对高精度进行患者的诊断预测,我们开发了一种新的方法,模式发现和解剖学,用于临床数据分析(CPDD),其能够发现模式(相关性特征/指示物)并使用它们即使类分布不平衡,也要对临床数据进行分类。在最常的设置中,关系数据集是一个大型表,以便每列代表属性(特征/指示),并且每行包含实体的一组属性值(AVS)(患者)。与现有的模式发现方法相比,CPDD可以从临床数据中发现一组小的简洁统计学大量的高阶模式,即使在小和罕见的群体中也能够解释和预测患者疾病类别。合成和胸临床数据集的实验表明,与其他现有的模式发现方法相比,CPDD CAN 1)发现了较小的简洁显着模式; 2)允许用户解释来自不相关来源的简洁模式,即使是群体是罕见的/小; 3)与其他可解释的分类方法相比,在预测中获得更好的性能。总之,CPDD在更大的全面覆盖范围内发现了更少的模式,以提高所发现的模式的可解释性。综合数据的实验结果验证了CPDD发现数据中植入的所有模式,精确地和简洁地显示了对解释和预测的统计支持,这是传统ML方法缺乏的能力。 CPDD作为解决不平衡类问题的新型可解释方法的成功表明,多年来临床数据分析的巨大潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号