首页> 外文期刊>Expert systems with applications >Unsupervised collective-based framework for dynamic retraining of supervised real-time spam tweets detection model
【24h】

Unsupervised collective-based framework for dynamic retraining of supervised real-time spam tweets detection model

机译:基于无监督的集体框架,用于监督实时垃圾邮件推文检测模型的动态再培训

获取原文
获取原文并翻译 | 示例

摘要

Twitter is one of the most popular social platforms. It has changed the way of communication and information dissemination through its real-time messaging mechanism. Recently, it has been used by researchers and industries as a new source of data for various intelligent systems, such as tweet sentiment analysis and recommendation systems, which require high data quality. However, due to its flexibility and popularity, Twitter has become the main target for spamming activities such as phishing legitimate users or spreading malicious software, which introduces new security issues and waste resources. Therefore, researchers have developed various machine-learning algorithms to reveal Twitter spam. However, as spammers have become smarter and more crafty, the characteristics of the spam tweets are varying over time making these methods inefficient to detect new spammers tricks and strategies. In addition, some of the employed methods (e.g. blacklisting) or spammer features (e.g. graph-based features) are extremely time-consuming, which hinders the ability to detect spammer activities in real-time. In this paper, we introduce a framework to deal with the volatility of the spam contents and new spamming patterns, called the spam drift. The framework combines the strength of unsupervised machine learning approach, which learns from unlabeled tweets, to retrain a real-time supervised tweet-level spam detection model in a batch mode. A set of experiments on a large-scale data set show the effectiveness of the proposed online unsupervised method in adaptively discovers and learns the patterns of new spam activities and achieve stable recall values reaching more than 95%. Although the average spam precision of our method is around 60%, the high spam recall values show the ability of our proposed method in reducing spam drift problems compared to traditional machine learning algorithms. (C) 2019 Elsevier Ltd. All rights reserved.
机译:Twitter是最受欢迎的社交平台之一。它通过其实时消息机制改变了通信和信息传播方式。最近,它已被研究人员和行业用于各种智能系统的新数据来源,例如推文情绪分析和推荐系统,这需要高数据质量。然而,由于其灵活性和流行度,Twitter已成为垃圾邮件活动,如网络钓鱼合法用户或传播恶意软件的主要目标,这引入了新的安全问题和废物资源。因此,研究人员已经开发了各种机器学习算法来揭示Twitter垃圾邮件。然而,由于垃圾邮件发送者变得更聪明,更狡猾,垃圾邮件推文的特性随着时间的推移而变化,使这些方法能够检测新的垃圾邮件表技巧和策略。此外,一些采用的方法(例如,黑名单)或垃圾邮件发送者特征(例如,基于图形的特征​​)是非常耗时的,这阻碍了实时检测垃圾邮件活动的能力。在本文中,我们介绍了一个框架来处理垃圾邮件内容的波动和新的垃圾邮件模式,称为垃圾邮件漂移。该框架结合了无监督机器学习方法的强度,这些方法从未标记的推文中学习,以批量模式重写实时监督的推特级垃圾邮件检测模型。一组大规模数据集的实验表明,在自动发现的拟议在线无监督方法的有效性,并学习新垃圾邮件活动的模式,并实现稳定的召回值达到95%以上。虽然我们方法的平均垃圾邮件精度约为60%,但高垃圾邮件召回值显示我们所提出的方法在减少传统机器学习算法中降低垃圾邮件漂移问题的能力。 (c)2019 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号