首页> 中文期刊> 《计算机应用与软件》 >集成PCA降维与分类算法的垃圾网页检测

集成PCA降维与分类算法的垃圾网页检测

         

摘要

针对垃圾网页的内容特征和链接特征,设计一种集成主成分分析PCA(Principal Component Analysis)与支持向量机分类算法的垃圾网页检测方法。该方法使用PCA来提取网页样本特征的主成分,使用主成分特征训练支持向量机(SVM)分类器。训练过程引入AdaBoost以提高分类器的性能。此外,采用聚类算法处理训练和测试数据集,解决了样本不均衡问题。通过在WebSpam-UK2007数据集上进行多组对比实验,结果表明,所设计的垃圾网页检测方案具有最高的检测率(0.851)。%In light of the content feature and link feature of spam webpages,we design a spam webpage detection method which combines the principal component analysis (PCA)and the SVMclassifier.The method uses PCA to extract the principal features of the sample websites and then uses them to train SVMclassifier,during the training process the Adaboost is introduced to improve the performance of the classifier. Besides,we adopt clustering algorithm to deal with the training and testing data sets,this solves unbalanced samples problem.Through a couple of contrast experiments on WEBSPAM-UK2007 dataset,the results demonstrate that the spam webpage detection scheme designed in this paper has highest detection rate (0.85 1 ).

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号