首页> 外文会议>IEEE International Conference on Machine Learning and Applications >Towards Web Spam Filtering Using a Classifier Based on the Minimum Description Length Principle
【24h】

Towards Web Spam Filtering Using a Classifier Based on the Minimum Description Length Principle

机译:使用基于最小描述长度原理的分类器进行Web垃圾邮件过滤

获取原文

摘要

The steady growth and popularization of the Web has led spammers to develop techniques to circumvent search engines aiming good visibility to their web pages in search results. They are responsible for serious problems such as dissatisfaction, irritation, exposure to unpleasant or malicious content, and financial loss. Despite different machine learning approaches have been used to detect web spam, many of them suffer with the curse of dimensionality or require a very high computational cost impeding their employment in real scenarios. In this way, there is still a big effort to develop more advanced methods that at the same time are able to prevent overfitting and fast to learn. To fill this gap, we present the MDLClass, a classifier technique based on the minimum description length principle, applied to the context of web spam filtering. The proposed method is very efficient, lightweight, multi-class, and fast. We also evaluated a new approach to detect web spam that combines the predictions obtained by the classifiers using content-based, link-based, and transformed link-based features. In our experiments, we employed two real, public and large datasets: the WEBSPAM-UK2006 and the WEBSPAM-UK2007. The results indicate that the proposed MDLClass and ensemble of predictions using different types of features are promising in the task of web spam filtering.
机译:Web的稳定增长和普及导致垃圾邮件发送者开发了一些技术来规避搜索引擎,这些引擎旨在使其搜索结果中的网页具有良好的可见性。他们应对严重的问题负责,例如不满,恼怒,接触不愉快或恶意的内容以及经济损失。尽管已使用不同的机器学习方法来检测Web垃圾邮件,但其中许多方法都遭受了维度的诅咒或需要很高的计算成本,从而阻碍了其在实际场景中的使用。以此方式,仍在努力开发更先进的方法,同时又能够防止过拟合和快速学习。为了填补这一空白,我们介绍了MDLClass,这是一种基于最小描述长度原则的分类器技术,适用于Web垃圾邮件过滤的上下文。所提出的方法非常有效,轻量级,多类且快速。我们还评估了一种检测Web垃圾邮件的新方法,该方法结合了分类器使用基于内容,基于链接和转换后的基于链接的功能所获得的预测。在我们的实验中,我们使用了两个真实的,公共的和大型的数据集:WEBSPAM-UK2006和WEBSPAM-UK2007。结果表明,提出的MDLClass和使用不同类型功能的预测集合在Web垃圾邮件过滤任务中很有希望。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号