首页> 外文期刊>Computer Science and Information Technology >Detecting Cloaking Web Spam Using Hash Function
【24h】

Detecting Cloaking Web Spam Using Hash Function

机译:使用哈希功能检测隐藏的Web垃圾邮件

获取原文
           

摘要

Web spam is an attempt to boost the ranking of special pages in search engine results. Cloaking is a kind of spamming technique. Previous cloaking detection methods based on terms/links differences between crawler and browser's copies are not accurate enough. The latest technique is tag-based method. This method could find cloaked pages better than previous algorithms. However, addressing the content of web pages provides more accurate results. This paper proposes an algorithm, working based on term differences between crawler and browser's copies. In addition, dynamic cloaking, which is a new and complicated kind of cloaking, is addressed. In order to increase the speed of comparison, we introduce hash value, calculated by Hash Function. The proposed algorithm has been tested with a data set of URLs. Experimental results indicate that our algorithm outperforms previous methods in both precision and recall. We estimate that about 9% of all URLs in data set utilize static cloaking and about 2% of all URLs utilize dynamic cloaking.
机译:网络垃圾邮件是试图提高特殊页面在搜索引擎结果中的排名。伪装是一种垃圾邮件技术。以前基于搜寻器和浏览器副本之间的术语/链接差异的伪装检测方法不够准确。最新技术是基于标签的方法。与以前的算法相比,此方法可以更好地找到隐藏的页面。但是,处理网页的内容可提供更准确的结果。本文提出了一种基于搜寻器和浏览器副本之间的术语差异的算法。另外,解决了动态隐身,这是一种新的复杂隐身。为了提高比较速度,我们引入了由哈希函数计算的哈希值。所提出的算法已通过URL数据集进行了测试。实验结果表明,我们的算法在准确性和查全率方面均优于以前的方法。我们估计数据集中所有URL的大约9%使用静态伪装,而所有URL的大约2%使用动态伪装。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号