首页> 外文会议>IEEE International Conference on Systems, Man, and Cybernetics >Classification of a real live heart failure clinical dataset- Is TAN Bayes better than other Bayes?
【24h】

Classification of a real live heart failure clinical dataset- Is TAN Bayes better than other Bayes?

机译:真实生活中的心力衰竭临床数据集的分类-TAN Bayes是否比其他Bayes更好?

获取原文

摘要

Real live clinical data often present itself with a number of usual challenges, such as class imbalance, high dimensionality and missing data. There is the added complexity of the data being distributed non-uniformly and skewed. Thus the performance of classical classification methods with this type of data is lower than with other types of data. Classification based on Bayes is often suggested as a better method, however, the typical assumption made for Bayes, such as variable and data distributions are not satisfied by real clinical data. This paper focuses on improving the performance of Bayesian classifiers but also on how the underlying structures of the data affects the performance. Thus this paper will focus on Bayesian methodologies, namely use of non-parametric Kernel Density Estimation (KDE) and Tree Augmented Naïve Bayes (TAN). The aim is to measure the performance on the heart failure dataset and by focusing on how the data structure improves the classification. The missing data present in the clinical heart failure datasets are replaced using two imputation methods and results compared. We also apply the imputed datasets on three classifiers including J48 (decision tree), naïve Bayesian multinomial and Bayesian network. The experiments show an improvement on the naïve Bayes using KDE, however TAN achieves significant improvement with the different missing value imputation methods. It is seen that TAN not only improves performance of the classifier, but also enhances prediction accuracy while maintaining efficiency and model simplicity.
机译:实际的实时临床数据通常会面临许多常见的挑战,例如类别不平衡,高维度和数据丢失。数据不均匀分布和倾斜会增加复杂性。因此,使用此类数据的经典分类方法的性能低于使用其他类型数据的性能。通常建议基于贝叶斯的分类是一种更好的方法,但是,对贝叶斯所做的典型假设(例如变量和数据分布)不能由实际的临床数据来满足。本文着重于提高贝叶斯分类器的性能,还着眼于数据的底层结构如何影响性能。因此,本文将重点讨论贝叶斯方法,即使用非参数内核密度估计(KDE)和树增强朴素贝叶斯(TAN)。目的是测量心力衰竭数据集的性能,并关注数据结构如何改善分类。临床心力衰竭数据集中存在的缺失数据使用两种插补方法进行了替换,并对结果进行了比较。我们还将推算的数据集应用于三个分类器,包括J48(决策树),朴素的贝叶斯多项式和贝叶斯网络。实验表明,使用KDE可以对朴素的贝叶斯算法进行改进,但是TAN通过使用不同的缺失值插补方法可以实现显着的改进。可以看出,TAN不仅可以提高分类器的性能,而且可以在保持效率和简化模型的同时提高预测精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号