Towards Generating Spam Queries for Retrieving Spam Accounts in Large-Scale Twitter Data

机译：致力于生成垃圾邮件查询以检索大规模Twitter数据中的垃圾邮件帐户

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Twitter, as a top microblogging site, has became a valuable source of up-to-date and real-time information for a wide range of social-based researches and applications. Intuitively, the main factor of having an acceptable performance in those recherches and applications is the working and relying on information having an adequate quality. However, given the painful truth that Twitter has turned out a fertile environment for publishing noisy information in different forms. Consequently, maintaining the condition of high quality is a serious challenge, requiring great efforts from Twitter's administrators and researchers to address the information quality issues. Social spam is a common type of the noisy information, which is created and circulated by ill-intentioned users, so-called social spammers. More precisely, they misuse all possible services provided by Twitter to propagate their spam content, leading to have a large information pollution flowing in Twitter's network. As Twitter's anti-spam mechanism is not both effective and immune towards the spam problem, enormous recherches have been dedicated to develop methods that detect and filter out spam accounts and tweets. However, these methods are not scalable when handling large-scale Twitter data. Indeed, as a mandatory step, the need for an additional information from Twitter's servers, limited to a few number of requests per 15min time window, is the main barrier for making these methods too effective, requiring months to handle large-scale Twitter data. Instead of inspecting every account existing in a given large-scale Twitter data in a sequential or randomly fashion, in this paper, we explore the applicability of information retrieval (IR) concept to retrieve a sub-set of accounts having high probability of being spam ones. Specifically, we introduce a design of an unsupervised method that partially processes a large-scale of tweets to generate spam queries related to account's attributes. Then, the spam queries are issued to retrieve and rank the highly potential spam accounts existing in the given large-scale Twitter accounts. Our experimental evaluation shows the efficiency of generating spam queries from different attributes to retrieve spam accounts in terms of precision, recall, and normalized discounted cumulative gain at different ranks.

机译：作为顶级的微博网站，Twitter已成为各种基于社会的研究和应用程序的最新和实时信息的宝贵来源。凭直觉，在那些检索和应用程序中具有可接受的性能的主要因素是工作并依赖具有足够质量的信息。但是，鉴于痛苦的事实，即Twitter已经证明了一个肥沃的环境，可以以各种形式发布嘈杂的信息。因此，保持高质量状态是一个严峻的挑战，需要Twitter的管理员和研究人员做出巨大的努力来解决信息质量问题。社交垃圾邮件是嘈杂信息的一种常见类型，由恶意用户（所谓的社交垃圾邮件制造者）创建并传播。更准确地说，他们滥用Twitter提供的所有可能的服务来传播其垃圾邮件内容，从而导致Twitter网络中流动着大量的信息污染。由于Twitter的反垃圾邮件机制既不有效，又无法抵御垃圾邮件问题，因此，大量研究人员致力于开发检测和过滤垃圾邮件帐户和推文的方法。但是，这些方法在处理大规模Twitter数据时无法扩展。确实，作为强制性步骤，需要从Twitter的服务器获取更多信息（每15分钟时间窗口中的请求数限制为几个），这是使这些方法变得过于有效的主要障碍，需要数月的时间才能处理大规模Twitter数据。在本文中，我们不是研究按顺序或随机方式检查给定大规模Twitter数据中存在的每个帐户，而是探索信息检索（IR）概念的适用性，以检索具有很高的垃圾邮件可能性的子帐户集那些。具体来说，我们介绍了一种无监督方法的设计，该方法可以部分处理大规模推文，以生成与帐户属性相关的垃圾邮件查询。然后，发出垃圾邮件查询以检索给定的大型Twitter帐户中存在的极有潜力的垃圾邮件帐户并对其进行排名。我们的实验评估显示了从不同属性生成垃圾邮件查询来检索垃圾邮件帐户的效率，这些准确性来自于不同等级的准确性，召回率和归一化折现累积收益。

著录项

来源
《International conference on enterprise information systems》|2017年|387-412|共26页
会议地点
作者
Mahdi Washha; Aziz Qaroush; Manel Mezghani; Florence Sedes;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Twitter; Social networks; Spam query;

机译：推特;社交网络;垃圾邮件查询;

相似文献

外文文献
中文文献
专利

1. Behavioural account-based features for filtering out social spammers in large-scale twitter data collections [J] . Mahdi Washha, Manel Mezghani, Florence Sedes Ingenierie des Systemes d'Information . 2017,第3期

机译：基于行为帐户的功能，用于过滤大规模Twitter数据集中的社交垃圾邮件发送者
2. Twitter spam account detection based on clustering and classification methods [J] . Adewole Kayode Sakariyah, Hang Tao, Wu Wanqing, Journal of supercomputing . 2020,第7期

机译：基于聚类和分类方法的Twitter垃圾邮件帐户检测
3. Detection of spam-posting accounts on Twitter [J] . Inuwa-Dutse Isa, Liptrott Mark, Korkontzelos Ioannis Neurocomputing . 2018,第NOVa13期

机译：在Twitter上检测垃圾邮件发布帐户
4. Towards Generating Spam Queries for Retrieving Spam Accounts in Large-Scale Twitter Data [C] . Mahdi Washha, Aziz Qaroush, Manel Mezghani, International conference on enterprise information systems . 2018

机译：在大规模推特数据中生成用于检索垃圾邮件帐户的垃圾邮件查询
5. Clustering spam domains and hosts: Anti-spam forensics with data mining. [D] . Wei, Chun. 2010

机译：群集垃圾邮件域和主机：具有数据挖掘功能的反垃圾邮件取证。
6. Spam spam spam spam spam … [O] . Neville Goodman 2004

机译：垃圾邮件垃圾邮件垃圾邮件垃圾邮件垃圾邮件…
7. Information Quality in Social Networks: Predicting Spammy Naming Patterns for Retrieving Twitter Spam Accounts [O] . Mahdi Washha, Aziz Qaroush, Manel Mezghani, 2017

机译：社交网络中的信息质量：预测用于检索Twitter垃圾邮件帐户的垃圾命名模式

Towards Generating Spam Queries for Retrieving Spam Accounts in Large-Scale Twitter Data

摘要

著录项

相似文献

相关主题

期刊订阅