首页> 外文会议>International conference on enterprise information systems >Towards Generating Spam Queries for Retrieving Spam Accounts in Large-Scale Twitter Data
【24h】

Towards Generating Spam Queries for Retrieving Spam Accounts in Large-Scale Twitter Data

机译:致力于生成垃圾邮件查询以检索大规模Twitter数据中的垃圾邮件帐户

获取原文

摘要

Twitter, as a top microblogging site, has became a valuable source of up-to-date and real-time information for a wide range of social-based researches and applications. Intuitively, the main factor of having an acceptable performance in those recherches and applications is the working and relying on information having an adequate quality. However, given the painful truth that Twitter has turned out a fertile environment for publishing noisy information in different forms. Consequently, maintaining the condition of high quality is a serious challenge, requiring great efforts from Twitter's administrators and researchers to address the information quality issues. Social spam is a common type of the noisy information, which is created and circulated by ill-intentioned users, so-called social spammers. More precisely, they misuse all possible services provided by Twitter to propagate their spam content, leading to have a large information pollution flowing in Twitter's network. As Twitter's anti-spam mechanism is not both effective and immune towards the spam problem, enormous recherches have been dedicated to develop methods that detect and filter out spam accounts and tweets. However, these methods are not scalable when handling large-scale Twitter data. Indeed, as a mandatory step, the need for an additional information from Twitter's servers, limited to a few number of requests per 15min time window, is the main barrier for making these methods too effective, requiring months to handle large-scale Twitter data. Instead of inspecting every account existing in a given large-scale Twitter data in a sequential or randomly fashion, in this paper, we explore the applicability of information retrieval (IR) concept to retrieve a sub-set of accounts having high probability of being spam ones. Specifically, we introduce a design of an unsupervised method that partially processes a large-scale of tweets to generate spam queries related to account's attributes. Then, the spam queries are issued to retrieve and rank the highly potential spam accounts existing in the given large-scale Twitter accounts. Our experimental evaluation shows the efficiency of generating spam queries from different attributes to retrieve spam accounts in terms of precision, recall, and normalized discounted cumulative gain at different ranks.
机译:作为顶级的微博网站,Twitter已成为各种基于社会的研究和应用程序的最新和实时信息的宝贵来源。凭直觉,在那些检索和应用程序中具有可接受的性能的主要因素是工作并依赖具有足够质量的信息。但是,鉴于痛苦的事实,即Twitter已经证明了一个肥沃的环境,可以以各种形式发布嘈杂的信息。因此,保持高质量状态是一个严峻的挑战,需要Twitter的管理员和研究人员做出巨大的努力来解决信息质量问题。社交垃圾邮件是嘈杂信息的一种常见类型,由恶意用户(所谓的社交垃圾邮件制造者)创建并传播。更准确地说,他们滥用Twitter提供的所有可能的服务来传播其垃圾邮件内容,从而导致Twitter网络中流动着大量的信息污染。由于Twitter的反垃圾邮件机制既不有效,又无法抵御垃圾邮件问题,因此,大量研究人员致力于开发检测和过滤垃圾邮件帐户和推文的方法。但是,这些方法在处理大规模Twitter数据时无法扩展。确实,作为强制性步骤,需要从Twitter的服务器获取更多信息(每15分钟时间窗口中的请求数限制为几个),这是使这些方法变得过于有效的主要障碍,需要数月的时间才能处理大规模Twitter数据。在本文中,我们不是研究按顺序或随机方式检查给定大规模Twitter数据中存在的每个帐户,而是探索信息检索(IR)概念的适用性,以检索具有很高的垃圾邮件可能性的子帐户集那些。具体来说,我们介绍了一种无监督方法的设计,该方法可以部分处理大规模推文,以生成与帐户属性相关的垃圾邮件查询。然后,发出垃圾邮件查询以检索给定的大型Twitter帐户中存在的极有潜力的垃圾邮件帐户并对其进行排名。我们的实验评估显示了从不同属性生成垃圾邮件查询来检索垃圾邮件帐户的效率,这些准确性来自于不同等级的准确性,召回率和归一化折现累积收益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号