Unsupervised collective-based framework for dynamic retraining of supervised real-time spam tweets detection model

Washha Mandi; Qaroush Aziz; Mezghani Manel; Sedes Florence

首页> 外文期刊>Expert systems with applications >Unsupervised collective-based framework for dynamic retraining of supervised real-time spam tweets detection model

【24h】

Unsupervised collective-based framework for dynamic retraining of supervised real-time spam tweets detection model

机译：基于无监督的集体框架，用于监督实时垃圾邮件推文检测模型的动态再培训

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Twitter is one of the most popular social platforms. It has changed the way of communication and information dissemination through its real-time messaging mechanism. Recently, it has been used by researchers and industries as a new source of data for various intelligent systems, such as tweet sentiment analysis and recommendation systems, which require high data quality. However, due to its flexibility and popularity, Twitter has become the main target for spamming activities such as phishing legitimate users or spreading malicious software, which introduces new security issues and waste resources. Therefore, researchers have developed various machine-learning algorithms to reveal Twitter spam. However, as spammers have become smarter and more crafty, the characteristics of the spam tweets are varying over time making these methods inefficient to detect new spammers tricks and strategies. In addition, some of the employed methods (e.g. blacklisting) or spammer features (e.g. graph-based features) are extremely time-consuming, which hinders the ability to detect spammer activities in real-time. In this paper, we introduce a framework to deal with the volatility of the spam contents and new spamming patterns, called the spam drift. The framework combines the strength of unsupervised machine learning approach, which learns from unlabeled tweets, to retrain a real-time supervised tweet-level spam detection model in a batch mode. A set of experiments on a large-scale data set show the effectiveness of the proposed online unsupervised method in adaptively discovers and learns the patterns of new spam activities and achieve stable recall values reaching more than 95%. Although the average spam precision of our method is around 60%, the high spam recall values show the ability of our proposed method in reducing spam drift problems compared to traditional machine learning algorithms. (C) 2019 Elsevier Ltd. All rights reserved.

机译：Twitter是最受欢迎的社交平台之一。它通过其实时消息机制改变了通信和信息传播方式。最近，它已被研究人员和行业用于各种智能系统的新数据来源，例如推文情绪分析和推荐系统，这需要高数据质量。然而，由于其灵活性和流行度，Twitter已成为垃圾邮件活动，如网络钓鱼合法用户或传播恶意软件的主要目标，这引入了新的安全问题和废物资源。因此，研究人员已经开发了各种机器学习算法来揭示Twitter垃圾邮件。然而，由于垃圾邮件发送者变得更聪明，更狡猾，垃圾邮件推文的特性随着时间的推移而变化，使这些方法能够检测新的垃圾邮件表技巧和策略。此外，一些采用的方法（例如，黑名单）或垃圾邮件发送者特征（例如，基于图形的特征）是非常耗时的，这阻碍了实时检测垃圾邮件活动的能力。在本文中，我们介绍了一个框架来处理垃圾邮件内容的波动和新的垃圾邮件模式，称为垃圾邮件漂移。该框架结合了无监督机器学习方法的强度，这些方法从未标记的推文中学习，以批量模式重写实时监督的推特级垃圾邮件检测模型。一组大规模数据集的实验表明，在自动发现的拟议在线无监督方法的有效性，并学习新垃圾邮件活动的模式，并实现稳定的召回值达到95％以上。虽然我们方法的平均垃圾邮件精度约为60％，但高垃圾邮件召回值显示我们所提出的方法在减少传统机器学习算法中降低垃圾邮件漂移问题的能力。（c）2019 Elsevier Ltd.保留所有权利。

著录项

来源
《Expert systems with applications》 |2019年第11期|129-152|共24页
作者
Washha Mandi; Qaroush Aziz; Mezghani Manel; Sedes Florence;
展开▼
作者单位

Univ Toulouse IRIT CNRS INPT UPS UTI Toulouse France;

Birzeit Univ Dept Elect & Comp Engn Ramallah Palestine;

Univ Toulouse IRIT CNRS INPT UPS UTI Toulouse France;

Univ Toulouse IRIT CNRS INPT UPS UTI Toulouse France;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Twitter; Real-time; Spam; Social spammers; Twitter stream;

机译：Twitter;实时;垃圾邮件;社会垃圾邮件发送者;Twitter流;

相似文献

外文文献
中文文献
专利

1. Unsupervised collective-based framework for dynamic retraining of supervised real-time spam tweets detection model [J] . Washha Mandi, Qaroush Aziz, Mezghani Manel, Expert Systems with Application . 2019,第NOVa期

机译：基于无监督的基于集体的框架，可对受监督的实时垃圾邮件推文检测模型进行动态再训练
2. Rumor detection in Arabic tweets using semi-supervised and unsupervised expectation-maximization [J] . Alzanin Samah M., Azmi Aqil M. Knowledge-Based Systems . 2019,第Deca1期

机译：使用半监督和无监督期望最大化的阿拉伯语推文中的谣言检测
3. MaskMitosis: a deep learning framework for fully supervised, weakly supervised, and unsupervised mitosis detection in histopathology images [J] . Medical and Biological Engineering and Computing: Journal of the International Federation for Medical and Biological Engineering . 2020,第7期

机译：面具症：在组织病理学图像中完全监督，弱弱监督和无调节丝分裂检测的深度学习框架
4. Detecting Spam Tweets Using Lightweight Detectors on Real-Time Basis and Update the Models Periodically in Batch Mode [C] . K. Jyothsna Reddy, R Sampath Reddy, P Vamsheedhar Reddy International Conference on Emerging Trends in Science and Engineering . 2019

机译：使用轻量级检测器实时检测垃圾邮件，并以批处理模式定期更新模型
5. Supervised Training on Synthetic Languages: A Novel Framework for Unsupervised Parsing [D] . Wang, Dingquan. 2019

机译：关于综合语培训：无监督解析的新框架
6. FRaC: a feature-modeling approach for semi-supervised and unsupervised anomaly detection [O] . Keith Noto, Carla Brodley, Donna Slonim -1

机译：FRAC：半监督和无人监督异常检测的特征建模方法
7. A Framework for Unsupervised Spam Detection in Social Networking Sites [O] . Maarten Bosma, Edgar Meij, Wouter Weerkamp 2015

机译：社交网站中无监督垃圾邮件检测的框架

Unsupervised collective-based framework for dynamic retraining of supervised real-time spam tweets detection model

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅