【24h】

Comparing Similarity of HTML Structures and Affiliate IDs in Splog Analysis

机译:在Splog分析中比较HTML结构和关联ID的相似性

获取原文

摘要

Spam blogs or splogs are blogs hosting spam posts, created using machine generated or hijacked content for the sole purpose of hosting advertisements or raising the number of in-links of target sites. Among those splogs, this paper focuses on detecting a group of splogs which are estimated to be created by an identical spammer. In this paper, we compare two clues: namely, similarity of HTML structures of splogs and affiliate IDs automatically extracted from splogs. We first show that the similarity of HTML structures of splogs is quite effective in splog detection, as well as in identifying spammers. We then show that the identity of affiliate IDs extracted from splogs can identify spammers much more directly than similarity of HTML structures, although it is not easy to achieve high coverage in extracting affiliate IDs. Finally, we show that the coverage of the intersection of the two clues, similarity of HTML structures and affiliate IDs, is relatively low, and it is necessary to apply them in a complementary strategy.
机译:垃圾邮件博客是包含垃圾邮件帖子的博客,这些博客是使用机器生成或劫持的内容创建的,其唯一目的是托管广告或增加目标站点的链接数。在这些垃圾邮件中,本文着重于检测估计由同一垃圾邮件发送者创建的一组垃圾邮件。在本文中,我们比较了两个线索:即splog的HTML结构的相似性和从splog中自动提取的会员ID。我们首先显示,splog的HTML结构的相似性在splog检测以及识别垃圾邮件发送者中非常有效。然后,我们表明,从splog中提取的会员ID的身份可以比HTML结构的相似性更直接地识别垃圾邮件发送者,尽管在提取会员ID方面实现高覆盖率并不容易。最后,我们表明,这两个线索的交集(HTML结构和关联ID的相似性)的覆盖率相对较低,因此有必要将它们应用在互补策略中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号