首页> 外文会议>International Symposium on Computational and Business Intelligence >Uncovering Cloaking Web Pages with Hybrid Detection Approaches
【24h】

Uncovering Cloaking Web Pages with Hybrid Detection Approaches

机译:用混合检测方法揭开粘附网页

获取原文

摘要

Web search cloaking, used by spammers for the purpose of increasing the visiting rates of their website, is a challenging spamming technique to search engines. Existing cloaking detection systems have some shortcomings: the accuracy of their algorithms is not high enough, the types of cloaking techniques that be detected are limited. In this paper, we present a new system to attack these two problems. To improve the detection accuracy, our algorithm combines text, tag and URL based method. For the purpose of detecting more types of cloaking techniques, our system works as follows: driving a real browser to execute scripts in web pages, crawl a page for the second time by modifying the referrer field of our HTTP headers, obtaining search engine's cached page for further comparison. We apply our system to 104,800 URLs extracted from Yahoo. Results show that our system can gain a high accuracy: precision at 94.52% and recall at 98.57%. More types of cloaking techniques are successfully detected by our system.
机译:Web搜索粘附,由垃圾邮件发送者使用,以增加其网站的访问率,是一种挑战搜索引擎的垃圾邮件技术。现有的粘附检测系统具有一些缺点:其算法的准确性不够高,所检测到的覆盖技术的类型受到限制。在本文中,我们展示了一种攻击这两个问题的新系统。为了提高检测准确性,我们的算法组合了基于文本,标签和URL的方法。为了检测更多类型的粘附技术,我们的系统工作如下:驾驶一个真正的浏览器以在网页中执行脚本,通过修改HTTP标头的引用字段,从而爬行页面,获取搜索引擎的缓存页面进一步比较。我们将系统应用于从雅虎提取的104,800个URL。结果表明,我们的系统可以获得高精度:精度为94.52%,并召回98.57%。我们的系统成功地检测到更多类型的粘附技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号