首页> 中文期刊> 《计算机应用与软件》 >基于模糊加权近似支持向量机的 Web文本分类

基于模糊加权近似支持向量机的 Web文本分类

             

摘要

Web文本分类是数据挖掘领域的研究热点. 针对Web文本数据集高维和不平衡的特点,将模糊隶属度和平衡因子引入近似支持向量机,提出模糊加权近似支持向量机. 首先计算样本的平均密度,并结合样本数量求得平衡因子,克服传统加权算法仅以样本数为依据设置权值的缺陷,缓解数据不平衡造成的分类超平面偏移;再计算样本的模糊隶属度,消除噪声和奇异点造成的分类误差;近似支持向量机相比标准支持向量机具有明显的速度优势,更加适用于高维数据分类. 实验表明,算法能有效提高不平衡数据的分类精度,在Web文本的训练速度和分类质量上有一定提高.%Web text classification is a hot topic in data mining field.In light of the high-dimension and imbalance features of Web text data, we propose in this paper the fuzzy weighted proximal support vector machine ( FWPSVM) which introduces fuzzy membership and balance factor to PSVM.First, it calculates the average density of samples, and seeks the balance factor in combination with samples' num-ber and overcomes the defect of traditional weighted algorithms that it sets the weighting value only based on samples' number, thus mitigates the offset of the classification hyperplane caused by the imbalanced data.Then it calculates the fuzzy membership of samples in order to elimi-nate the classification error incurred from noise and singular point.The PSVM has noticeable advantage in speed compared with standard SVM, and is more suitable for high-dimension data classification.Experiments indicate that the proposed algorithm can effectively improve the classification accuracy of imbalanced data, and makes certain improvement on Web text training speed and classification quality.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号