首页> 外文会议>International Symposium on Computational and Business Intelligence >Uncovering Cloaking Web Pages with Hybrid Detection Approaches
【24h】

Uncovering Cloaking Web Pages with Hybrid Detection Approaches

机译:使用混合检测方法发现隐藏的网页

获取原文

摘要

Web search cloaking, used by spammers for the purpose of increasing the visiting rates of their website, is a challenging spamming technique to search engines. Existing cloaking detection systems have some shortcomings: the accuracy of their algorithms is not high enough, the types of cloaking techniques that be detected are limited. In this paper, we present a new system to attack these two problems. To improve the detection accuracy, our algorithm combines text, tag and URL based method. For the purpose of detecting more types of cloaking techniques, our system works as follows: driving a real browser to execute scripts in web pages, crawl a page for the second time by modifying the referrer field of our HTTP headers, obtaining search engine's cached page for further comparison. We apply our system to 104,800 URLs extracted from Yahoo. Results show that our system can gain a high accuracy: precision at 94.52% and recall at 98.57%. More types of cloaking techniques are successfully detected by our system.
机译:垃圾邮件发送者用来提高其网站访问率的目的是使用网络搜索伪装,对于搜索引擎而言,这是一项具有挑战性的垃圾邮件发送技术。现有的隐蔽检测系统有一些缺点:其算法的准确性不够高,所检测到的隐蔽技术的类型受到限制。在本文中,我们提出了一个新的系统来解决这两个问题。为了提高检测精度,我们的算法结合了基于文本,标签和URL的方法。为了检测更多类型的伪装技术,我们的系统工作如下:驱动真正的浏览器执行网页中的脚本,通过修改HTTP标头的引荐来源字段来第二次抓取页面,获得搜索引擎的缓存页面进行进一步比较。我们将系统应用于从Yahoo提取的104,800个URL。结果表明,我们的系统可以获得较高的精度:精度为94.52%,召回率为98.57%。我们的系统已成功检测到更多类型的隐形技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号