【24h】

Tag spam creates large non-giant connected components

机译:标签垃圾邮件会创建大型的非巨型连接组件

获取原文

摘要

Spammers in social bookmarking systems try to mimick bookmarking behaviour of real users to gain the attention of other users or search engines. Several methods have been proposed for the detection of such spam, including domain-specific features (like URL terms) or similarity of users to previously identified spammers. However, as shown in our previous work, it is possible to identify a large fraction of spam users based on purely structural features. The hypergraph connecting documents, users, and tags can be decomposed into connected components, and any large, but non-giant components turned out to be almost entirely inhabitated by spam users in the examined dataset. Here, we test to what degree the decomposition of the complete hypergraph is really necessary, examining the component structure of the induced user/document and user/tag graphs. While the user/tag graph's connectivity does not help in classifying spammers, the user/document graph's connectivity is already highly informative. Itcan however be augmented with connectivity information from the hypergraph. In our view, spam detection based on structural features, like the one proposed here, requires complex adaptation strategies from spammers and may complement other, more traditional detection approaches.
机译:社交书签系统中的垃圾邮件发送者试图模仿真实用户的书签行为,以引起其他用户或搜索引擎的关注。已经提出了几种检测此类垃圾邮件的方法,包括特定于域的功能(例如URL术语)或用户与先前识别的垃圾邮件发送者的相似性。但是,如我们先前的工作所示,有可能基于纯粹的结构特征来识别大部分垃圾邮件用户。可以将连接文档,用户和标签的超图分解为连接的组件,结果是,检查的数据集中垃圾邮件用户几乎完全占用了任何大型但非巨型的组件。在这里,我们检查诱导出的用户/文档和用户/标签图的组件结构,在何种程度上真正需要分解整个超图。尽管用户/标签图的连通性无法帮助对垃圾邮件发送者进行分类,但用户/文档图的连通性已具有很高的信息量。但是,可以使用来自超图的连接信息来增强它。我们认为,基于结构特征的垃圾邮件检测(如此处提出的那样)需要垃圾邮件发送者采取复杂的适应策略,并且可能会补充其他更传统的检测方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号