
A New Instance-weighting Naive Bayes Text Classifiers




It is shown in recent research that naive Bayes text classifiers have achieved noticeable classification performance despite its strong assumption of conditional independence among features. In order to weaken this unrealistic assumption and improve the classification accuracy, there are generally three methods: structures manipulating, features manipulating, and instances manipulating. Instances manipulating can be further divided into instance-weighting and instance-selecting. In this paper, we propose a new instance-weighting approach to naive Bayes text classifier. In this new approach, the training dataset is firstly divided into several subsets according to their class value. Then every training instance in a subset is weighted according to the distance between it and the mean of the training subset. The experimental results on 15 text document datasets show that in terms of the accuracy of classification, our method performs better than three existing naive Bayes text classifiers.
机译:在最近的研究中显示,尽管有强烈的特征独立假设,但Naive Bayes文本分类器已经实现了明显的分类表现。为了削弱这种不现实的假设并提高分类准确性,通常存在三种方法:操纵结构,操纵和操作的情况。操作可以进一步分为执行实例加权和实例选择。在本文中,我们向Naive Bayes文本分类器提出了一种新的实例加权方法。在这种新方法中,训练数据集首先根据其类值分为多个子集。然后根据其之间的距离和训练子集的距离来加权子集中的每个训练实例。在15个文本文档数据集上的实验结果表明,就分类的准确性而言,我们的方法比三个现有的天真贝叶斯文本分类器更好。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号