首页> 外文会议>International ACM SIGIR conference on research development in information retrieval >Fighting against Web Spam: A Novel Propagation Method based on Click-through Data
【24h】

Fighting against Web Spam: A Novel Propagation Method based on Click-through Data

机译:对抗Web垃圾邮件:一种基于点击数据的新型传播方法

获取原文

摘要

Combating Web spam is one of the greatest challenges for Web search engines. State-of-the-art anti-spam techniques focus mainly on detecting varieties of spam strategies, such as content spamming and link-based spamming. Although these anti-spam approaches have had much success, they encounter problems when fighting against a continuous barrage of new types of spamming techniques. We attempt to solve the problem from a new perspective, by noticing that queries that are more likely to lead to spam pages/sites have the following characteristics: 1) they are popular or reflect heavy demands for search engine users and 2) there are usually few key resources or authoritative results for them. From these observations, we propose a novel method that is based on click-through data analysis by propagating the spamicity score iteratively between queries and URLs from a few seed pages/sites. Once we obtain the seed pages/sites, we use the link structure of the click-through bipartite graph to discover other pages/sites that are likely to be spam. Experiments show that our algorithm is both efficient and effective in detecting Web spam. Moreover, combining our method with some popular anti-spam techniques such as TrustRank achieves improvement compared with each technique taken individually.
机译:打击Web垃圾邮件是Web搜索引擎的最大挑战之一。最先进的反垃圾邮件技术主要集中在探测垃圾邮件策略的品种,例如含有垃圾邮件和基于链接的垃圾邮件。虽然这些反垃圾邮件方法已经取得了多大成功,但它们在抗争于持续的新类型垃圾邮件技术时遇到问题。我们试图通过新的视角来解决问题,通过注意到更有可能导致垃圾邮件页面/站点的查询具有以下特征:1)它们是流行的或反映搜索引擎用户的重量需求和2)通常几乎没有关键资源或权威结果。根据这些观察,我们提出了一种新的方法,基于点击数据分析,通过在来自几个种子页面/站点的验证和URL之间迭代地传播井地性分数。获取种子页面/站点后,我们使用点击双方图的链路结构来发现可能是垃圾邮件的其他页面/站点。实验表明,我们的算法在检测Web垃圾邮件方面都有效且有效。此外,与单独拍摄的每种技术相比,将我们的方法与算力等一些流行的反垃圾邮件技术相结合。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号