...
首页> 外文期刊>Expert systems with applications >Detection Of Cloaked Web Spam By Using Tag-based Methods
【24h】

Detection Of Cloaked Web Spam By Using Tag-based Methods

机译:基于标记的方法检测隐藏的Web垃圾邮件

获取原文
获取原文并翻译 | 示例
           

摘要

Web spam attempts to influence search engine ranking algorithm in order to boost the rankings of specific web pages in search engine results. Cloaking is a widely adopted technique of concealing web spam by replying different content to search engines' crawlers from that displayed in a web browser. Previous work on cloaking detection is mainly based on the differences in terms and/or links between multiple copies of a URL retrieved from web browser and search engine crawler perspectives. This work presents three methods of using difference in tags to determine whether a URL is cloaked. Since the tags of a web page generally do not change as frequently and significantly as the terms and links of the web page, tag-based cloaking detection methods can work more effectively than the term- or link-based methods. The proposed methods are tested with a dataset of URLs covering short-, medium- and long-term users' interest. Experimental results indicate that the tag-based methods outperform term- or link-based methods in both precision and recall. Moreover, a Weka J4.8 classifier using a combination of term and tag features yields an accuracy rate of 90.48%.
机译:网络垃圾邮件试图影响搜索引擎排名算法,以提高搜索引擎结果中特定网页的排名。伪装是一种广泛使用的隐藏Web垃圾邮件的技术,它通过将与Web浏览器中显示的内容不同的内容答复给搜索引擎的爬网程序。先前关于伪装检测的工作主要是基于从Web浏览器和搜索引擎爬网程序角度检索的URL的多个副本之间的术语和/或链接的差异。这项工作提出了三种使用标记差异来确定URL是否被掩盖的方法。由于网页的标签通常不会像网页的术语和链接那样频繁频繁地发生显着变化,因此基于标签的隐匿检测方法比基于术语或链接的方法更有效。使用涵盖短期,中期和长期用户兴趣的URL数据集对提出的方法进行了测试。实验结果表明,基于标签的方法在准确性和召回率方面均优于基于术语或链接的方法。此外,结合了术语和标记功能的Weka J4.8分类器产生的准确率为90.48%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号