【24h】

Research on Web Spam Detection Base on Support Vector Machine

机译:基于支持向量机的Web垃圾邮件检测研究

获取原文

摘要

with the fast development of Internet, web pages created by web spam which aimed at cheating the search engine and increasing rankings in the search results are prevailing. Web spam is a big problem for today's search engine; therefore it is necessary for search engines to be able to detect web spam during crawling. The web spam detection problem is viewed as a classification problem, that means classification models are created by machine learning classification algorithms, which given a web page, it will classify it in one of two categories: Normal and Spam. For support vector machine classification model, soft margin classifier based on linear support vector machine was developed by learning the sample set, and penalty functions were defined according to the links between web pages that seems to have similar characteristics. Not only the content features but also the link structures between web pages were taken advantage of to build classifier.
机译:随着Internet的快速发展,由网络垃圾邮件创建的,旨在欺骗搜索引擎并提高搜索结果排名的网页正在盛行。网络垃圾邮件对于当今的搜索引擎来说是一个大问题。因此,搜索引擎必须能够在爬网期间检测到网络垃圾邮件。 Web垃圾邮件检测问题被视为分类问题,这意味着分类模型是由机器学习分类算法创建的,分类模型给定一个网页,将其分类为以下两个类别之一:“正常”和“垃圾邮件”。对于支持向量机分类模型,通过学习样本集,开发了基于线性支持向量机的软边际分类器,并根据似乎具有相似特征的网页之间的链接定义了惩罚函数。不仅利用内容功能,而且利用网页之间的链接结构来构建分类器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号