首页> 外文期刊>Journal of Information Science >Spam profiles detection on social networks using computational intelligence methods: The effect of the lingual context
【24h】

Spam profiles detection on social networks using computational intelligence methods: The effect of the lingual context

机译:垃圾邮件配置文件使用计算智能方法检测社交网络:语言背景的效果

获取原文
获取原文并翻译 | 示例
           

摘要

In online social networks, spam profiles represent one of the most serious security threats over the Internet; if they do not stop producing bad advertisements, they can be exploited by criminals for various purposes. This article addresses the nature and the characteristics of spam profiles in a social network like Twitter to improve spam detection, based on a number of publicly available language-independent features. In order to investigate the effectiveness of these features in spam detection, four datasets are extracted for four different language contexts (i.e. Arabic, English, Korean and Spanish), and a fifth is formed by combining them all. We conduct our experiments using a set of five well-known classification algorithms in spam detection field, k-Nearest Neighbours (k-NN), Random Forest (RF), Naive Bayes (NB), Decision Tree (DT) (J48) and Multilayer Perceptron (MLP) classifiers, along with five filter-based feature selection methods, namely, Information Gain, Chi-square, ReliefF, Correlation and Significance. The results show oscillating performance of each classifier across all datasets, but improved classification results with feature selection. In addition, detailed analysis and comparisons are carried out on two different levels: in the first level, we compare the selected features' importance among the feature selection methods, whereas in the second level, we observe the relations and the importance of the selected features across all data-sets. The findings of this article lead to a better understanding of social spam and improving detection methods by considering the various important features resulting from the different lingual contexts.
机译:在线社交网络中,垃圾邮件概况代表互联网上最严重的安全威胁之一;如果他们没有停止产生不良广告,他们可以因各种目的而被犯罪分子开发。本文根据许多公开的语言 - 独立功能,解决了像Twitter这样的社交网络中垃圾邮件配置文件的性质和特征,以改善垃圾邮件检测。为了研究垃圾邮件检测中这些特征的有效性,为四种不同的语言上下文提取了四个数据集(即阿拉伯文,英语,韩国和西班牙语),并通过将它们组合来形成第五个。我们在垃圾邮件检测场中的一组五种着名的分类算法,K-CORMONT邻居(K-NN),随机林(RF),NAB),决策树(DT)(J48)以及多层erceptron(MLP)分类器,以及五个基于滤波器的特征选择方法,即信息增益,Chi-Square,Creieff,相关性和意义。结果显示了在所有数据集中的每个分类器的振荡性能,但通过特征选择改进了分类结果。此外,详细分析和比较在两个不同的水平上进行:在第一级,我们比较特征选择方法之间所选功能的重要性,而在第二级,我们遵守所选功能的关系和重要性在所有数据集中。本文的调查结果导致了解社会垃圾邮件,通过考虑不同语言环境引起的各种重要特征来更好地了解社会垃圾邮件和改善检测方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号