首页> 外文会议>ICCEE 2010;International conference on computer and electrical engineering >A Modified Complement Naive Bayes for ChineseWeb Page Classification
【24h】

A Modified Complement Naive Bayes for ChineseWeb Page Classification

机译:中文网页分类的一种改进的补充朴素贝叶斯

获取原文

摘要

The naive Bayes classifier is effective in teit classification, however, it suffers from the problem of under studied bias effect when dealing with skewed data. A complement naive Bayes classifier is raised to mitigate the effect. Nevertheless, it shows disappointing results on skewed web page data. Focusing on the poor performance of complement naive Bayes algorithm on skewed data set, this paper presents a modified complement naive Bayes algorithm by using a superior estimation for the prior class probability. Comprehensive experiments show that the modified complement naive Bayes algorithm exhibits excellent robustness to skewed data and achieves higher precision than any other naive Bayes algorithm. Furthermore, regarding the difference between web page classification and text classification, in this paper, a title weighted vector space model is presented and the effect of title weighted factor on classifier's precision is analyzed. Experimental results show that the precision is improved by 5% on average by using title weighted vector space model.
机译:朴素的贝叶斯分类器在分类中是有效的,但是,在处理偏斜数据时,存在着偏见效应不足的问题。提出了一个补充性的朴素贝叶斯分类器以减轻这种影响。但是,它在倾斜的网页数据上显示令人失望的结果。针对偏补数据集上的补余朴素贝叶斯算法性能较差的问题,本文提出了一种改进的补余朴素贝叶斯算法,该算法采用了对先验类概率的上乘估计。综合实验表明,改进的补数朴素贝叶斯算法对偏斜数据表现出优异的鲁棒性,并且比其他朴素贝叶斯算法具有更高的精度。此外,针对网页分类与文本分类的区别,提出了一种标题加权向量空间模型,并分析了标题加权因子对分类器精度的影响。实验结果表明,使用标题加权向量空间模型可使精度平均提高5%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号