Imbalanced Web Spam Classification Using Self-labeled Techniques and Multi-classifier Models

机译：使用自标记技术和多分类器模型的不平衡网络垃圾邮件分类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Web spam has become a critical problem in web search area. Unfortunately, highly imbalanced distribution and too many unlabeled instances always disturb the performance of classifiers. In this paper, we focus on solving the serious imbalance distribution of web spam under the semi-supervised learning frame. First, we introduce the self-labeled techniques and the multi-classifier mode. Second, the imbalance situation of web spam data sets and five combination methods are proposed. Particularly, we propose several improved self-labeled methods by using classic over-sampling technique SMOTE in pre-processing stage, and then balance the uneven labeled sets. Further, considering the serious imbalance situation of web spam, we introduce the AUC value into semi-supervised classification. Experiments under WEBSPAM UK2007 indicate that our methods can get better performance both on recall and AUC values.

机译：Web Spam已成为Web搜索区域的关键问题。不幸的是，高度不平衡的分布和太多未标记的实例总是扰乱分类器的性能。在本文中，我们专注于解决半监督学习框架下网垃圾邮件的严重不平衡分布。首先，我们介绍了自我标记的技术和多分类器模式。其次，提出了Web垃圾邮件数据集的不平衡情况和五种组合方法。特别是，我们通过使用经典的过采样技术在预处理阶段中缩小了几种改进的自我标记方法，然后平衡了不均标记的集合。此外，考虑到Web垃圾邮件的严重不平衡情况，我们将AUC值介绍为半监督分类。 WebSPAM UK2007下的实验表明我们的方法可以在召回和AUC值上获得更好的性能。

著录项

来源
《International Conference on Knowledge Science, Engineering and Management》|2015年||共6页
会议地点
作者
Xiaonan Fang; Yanyan Tan; Xiyuan Zheng; Huaxiang Zhang; Shuang Zhou;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词
Imbalanced datasets; Web spam; Semi-supervised learning; Self-labeled techniques; Multi-classifier models; Ensemble learning;

机译：不平衡的数据集;网垃圾邮件;半监督学习;自我标记的技术;多分类器模型;集合学习;

相似文献

外文文献
中文文献
专利

1. A dynamic model for integrating simple web spam classification techniques [J] . Fdez-Glez Jorge, Ruano-Ordas David, Ramon Mendez Jose, Expert Systems with Application . 2015,第21期

机译：集成简单Web垃圾邮件分类技术的动态模型
2. Performance Evaluation of User-Behaviour Techniques of Web Spam Detection Models [J] . Oluwatoyin Odukoya, Bodunde Akinyemi, Mohammed Fofana, Network and Complex Systems . 2019,第2期

机译：网络垃圾邮件检测模型的用户行为技术性能评估
3. Correlation-based feature subset selection technique for web spam classification [J] . Surender Singh, Ashutosh Kumar Singh International Journal of Web Engineering and Technology . 2018,第4期

机译：基于相关的网站垃圾邮件分类的特征子集选择技术
4. Imbalanced Web Spam Classification Using Self-labeled Techniques and Multi-classifier Models [C] . Xiaonan Fang, Yanyan Tan, Xiyuan Zheng, International conference on knowledge science, engineering and management . 2015

机译：使用自标记技术和多分类器模型的不平衡Web垃圾邮件分类
5. Classification techniques for noisy and imbalanced data. [D] . Napolitano, Amri. 2009

机译：嘈杂和不平衡数据的分类技术。
6. Beyond the Antagonism: Self-Labeled Xanthone Inhibitorsas Modeled Two-in-One Drugs in Cancer Therapy [O] . Fu-Chao Yu, #, Xin-Rong Lin, 2017

机译：超越拮抗作用：自标记的蒽酮抑制剂在癌症治疗中被建模为二合一药物
7. Performance Evaluation of User-Behaviour Techniques of Web Spam Detection Models [O] . 2019

机译：网络垃圾邮件检测模型的用户行为技术性能评估

Imbalanced Web Spam Classification Using Self-labeled Techniques and Multi-classifier Models

摘要

著录项

相似文献

相关主题

期刊订阅