首页> 外文期刊>Journal of ambient intelligence and humanized computing >A new complement naive Bayesian approach for biomedical data classification
【24h】

A new complement naive Bayesian approach for biomedical data classification

机译:一种新的互补朴素贝叶斯方法进行生物医学数据分类

获取原文
获取原文并翻译 | 示例
           

摘要

Biomedical data classification tasks are very challenging because data is usually large, noised and imbalanced. Particularly the noise can reduce system performance in terms of classification accuracy, time in building a classifier and the size of the classifier. Accordingly, most existing learning algorithms have integrated various approaches to enhance their learning abilities from noisy environments, but the existence of noise can still introduce serious negative impacts. A more reasonable solution might be to employ some preprocessing mechanisms to handle noisy instances before a learner is formed. Therefore, we introduce a method called double learning to improve the classification performance of our model. As to the author's knowledge, most of the previous works used the normal (noise free) instances for model construction (training) after the noise instances are isolated. This approach increases computational task on model construction for active learners and total computational time for passive learners. It also ignores minority data instance which leads to miss classification of instances from minority group as test cases. The main idea of this paper is to construct a model using noised instances. This approach minimizes the model construction time by reducing the number of instances and improves classification performance. Therefore, only the identified noised data are used for model construction instead of the normal (noise free) data. Since noised instances are used for model construction, the entire naive Bayesian working logic is reversed. This method is called complement naive Bayesian (CNB) which makes use of the idea of complement based learning to improve the accuracy performance. Finally, the performance of the proposed CNB is compared to naive Bayesian and some other classification algorithms with the single photon emission computed tomography, Indian liver patient dataset, Wilt and Tic-Tac-Toe endgame datasets. The experimental results demonstrated that the proposed approach has shown promising results in terms of computational time and accuracy performance on both balanced and imbalanced datasets used.
机译:生物医学数据分类任务非常具有挑战性,因为数据通常很大,杂乱无章且不平衡。特别地,噪声会在分类精度,构建分类器的时间以及分类器的大小方面降低系统性能。因此,大多数现有的学习算法已经集成了各种方法来增强其从嘈杂环境中的学习能力,但是噪声的存在仍然会带来严重的负面影响。一个更合理的解决方案可能是在学习者形成之前采用一些预处理机制来处理嘈杂的实例。因此,我们引入了一种称为双重学习的方法来改善模型的分类性能。据作者所知,在隔离噪声实例后,大多数先前的工作都使用正常(无噪声)实例进行模型构建(训练)。这种方法增加了主动学习者模型构建的计算任务,而增加了被动学习者的总计算时间。它还会忽略少数派数据实例,这会导致将少数派实例中的实例分类为测试用例。本文的主要思想是使用噪声实例构建模型。该方法通过减少实例数量并提高分类性能,最大程度地缩短了模型构建时间。因此,仅将识别出的噪声数据用于模型构建,而不是常规(无噪声)数据。由于使用噪声实例进行模型构建,因此整个朴素的贝叶斯工作逻辑是相反的。这种方法称为补数朴素贝叶斯(CNB),它利用基于补数的学习思想来提高准确性。最后,将所提出的CNB的性能与朴素贝叶斯算法和其他一些分类算法进行了比较,包括单光子发射计算机断层扫描,印度肝病患者数据集,Wilt和Tic-Tac-Toe残局数据集。实验结果表明,该方法在所使用的平衡数据集和不平衡数据集的计算时间和准确性方面均显示出了令人鼓舞的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号