首页> 外文会议>International Symposium on Knowledge and Systems Sciences >THE IMPROVEMENT OF NAIVE BAYESIAN CLASSIFIER BASED ON THE STRATEGY OF FEATURE SELECTION AND SAMPLE CLEANING
【24h】

THE IMPROVEMENT OF NAIVE BAYESIAN CLASSIFIER BASED ON THE STRATEGY OF FEATURE SELECTION AND SAMPLE CLEANING

机译:基于特征选择和样品清洁策略的幼稚贝叶斯分类器的改进

获取原文
获取外文期刊封面目录资料

摘要

Naive Bayesian Classifier (NBC) is a simple and effective classification model. Though it shows a lot of edges over many other Classifiers, it does not always yield satisfactory result. In this paper, we give a summary of the previous improvement methods for the NBC model. Then three improvement strategies are proposed: the feature selection strategy, the sample cleaning strategy and the mixed strategy. By choosing the optimized feature subset according to the feature important factor (FIF) of every feature, the first method simplifies the dimensionality of dataset; the second method deletes noisy samples within the training dataset according to the sample polluting factor; the third method integrates the two methods: first the feature selection, and then the sample cleaning. Through the experimental comparison and analysis on the UCI repository, these strategies are proved effective. Averagely speaking, with 36.76% of the features in the original feature set, we can raise the prediction accuracy by 2.30% using the first method. While with 92.57% samples in the training dataset, we can raise the prediction accuracy by 1.59% using the second method. As to the third method, the prediction accuracy can be increased 2.55%. Among these strategies, the mixed one shows the advantages over the other two, which reduces the complexity of the model while increasing the prediction accuracy of the NBC model.
机译:天真贝叶斯分类器(NBC)是一个简单有效的分类模型。虽然它在许多其他分类器上显示了很多边缘,但它并不总是产生令人满意的结果。在本文中,我们概述了NBC模型的先前改进方法。然后提出了三种改进策略:特征选择策略,样品清洁策略和混合策略。通过根据每个特征的特征重要因素(FIF)选择优化的特征子集,第一种方法简化了数据集的维度;第二种方法根据样品污染因子删除训练数据集内的噪声样本;第三种方法集成了这两种方法:首先是特征选择,然后进行样品清洁。通过对UCI存储库的实验比较和分析,证明了这些策略有效。平均说话,有36.76%的原始功能集中的功能,我们可以使用第一种方法将预测精度提高2.30%。虽然在训练数据集中有92.57%的样本,但我们可以使用第二种方法将预测精度提高1.59%。关于第三种方法,预测精度可以增加2.55%。在这些策略中,混合的策略显示了其他两个的优点,这减少了模型的复杂性,同时增加了NBC模型的预测精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号