首页> 外文期刊>Expert Systems with Application >Unsupervised collective-based framework for dynamic retraining of supervised real-time spam tweets detection model
【24h】

Unsupervised collective-based framework for dynamic retraining of supervised real-time spam tweets detection model

机译:基于无监督的基于集体的框架,可对受监督的实时垃圾邮件推文检测模型进行动态再训练

获取原文
获取原文并翻译 | 示例

摘要

Twitter is one of the most popular social platforms. It has changed the way of communication and information dissemination through its real-time messaging mechanism. Recently, it has been used by researchers and industries as a new source of data for various intelligent systems, such as tweet sentiment analysis and recommendation systems, which require high data quality. However, due to its flexibility and popularity, Twitter has become the main target for spamming activities such as phishing legitimate users or spreading malicious software, which introduces new security issues and waste resources. Therefore, researchers have developed various machine-learning algorithms to reveal Twitter spam. However, as spammers have become smarter and more crafty, the characteristics of the spam tweets are varying over time making these methods inefficient to detect new spammers tricks and strategies. In addition, some of the employed methods (e.g. blacklisting) or spammer features (e.g. graph-based features) are extremely time-consuming, which hinders the ability to detect spammer activities in real-time. In this paper, we introduce a framework to deal with the volatility of the spam contents and new spamming patterns, called the spam drift. The framework combines the strength of unsupervised machine learning approach, which learns from unlabeled tweets, to retrain a real-time supervised tweet-level spam detection model in a batch mode. A set of experiments on a large-scale data set show the effectiveness of the proposed online unsupervised method in adaptively discovers and learns the patterns of new spam activities and achieve stable recall values reaching more than 95%. Although the average spam precision of our method is around 60%, the high spam recall values show the ability of our proposed method in reducing spam drift problems compared to traditional machine learning algorithms. (C) 2019 Elsevier Ltd. All rights reserved.
机译:Twitter是最受欢迎的社交平台之一。它通过其实时消息传递机制改变了通信和信息传播的方式。最近,它已被研究人员和行业用作各种智能系统的新数据源,例如要求高质量数据的推特情感分析和推荐系统。但是,由于其灵活性和受欢迎程度,Twitter已成为诸如钓鱼网站合法用户或传播恶意软件之类的垃圾邮件活动的主要目标,这引入了新的安全问题并浪费了资源。因此,研究人员开发了各种机器学习算法来揭示Twitter垃圾邮件。但是,随着垃圾邮件发送者变得越来越聪明和狡猾,垃圾邮件推文的特征随着时间的推移而变化,使得这些方法无法有效地检测到新的垃圾邮件发送者的技巧和策略。另外,一些采用的方法(例如黑名单)或垃圾邮件发送者特征(例如基于图形的特征​​)非常耗时,这妨碍了实时检测垃圾邮件发送者活动的能力。在本文中,我们介绍了一个框架来处理垃圾邮件内容的波动性和称为垃圾邮件漂移的新垃圾邮件模式。该框架结合了从无标签推文中学习的无监督机器学习方法的优势,以批处理方式重新训练了实时有监督推文级别的垃圾邮件检测模型。在大规模数据集上进行的一组实验表明,所提出的在线无监督方法在自适应地发现和学习新垃圾邮件活动的模式方面的有效性,并实现了超过95%的稳定召回率。尽管我们的方法的平均垃圾邮件准确度约为60%,但垃圾邮件的高回收率值表明,与传统的机器学习算法相比,我们提出的方法能够减少垃圾邮件漂移问题。 (C)2019 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号