首页> 外文会议>International conference on big data analytics >PRISMO: Priority Based Spam Detection Using Multi Optimization
【24h】

PRISMO: Priority Based Spam Detection Using Multi Optimization

机译:PRISMO:使用多重优化的基于优先级的垃圾邮件检测

获取原文

摘要

The rapid growth of social networking sites such as Twitter, Face-book, Google+, MySpace, Snapchat, Instagram, etc., along with its local invariants such as Weibo, Hyves, etc., has made them infiltrated with a large amount of spamming activities. Based on the features, an account or content can be classified as spam or benign. The presence of some irrelevant features decreases the performance of the classifier, understandability of dataset, and the time requirement for training and classification increases. Therefore, Feature subset selection is an essential phase in the process of machine learning mechanism. The objective of feature subset selection is to choose a subset of size 's' (s < n) from the total set of 'n' features that results in the least classification error. The feature subset selection problem can be represented as a problem of optimization in which the objective is to choose the near-optimal subset of features. Based on the literature survey, it is found that the classifier will offer its best performance if the data with high dimension is reduced such that it includes only appropriate features having lesser redundancy. The contribution of this paper comprises feature subset and its cost optimization simultaneously. The fundamental aspect PRISMO is to generate a primary feature subset through various optimization algorithms for the initialization stage. Further, the subset has been generated using the initial feature set based on their priority using basic rules of conjunction and disjunction. To evaluate the overall efficiency of PRISMO, various experiments were carried out using different dataset. The obtained result shows that the proposed model effectively reduces the cardinality of features without any bias to a specific dataset and any degradation to the classifier accurateness.
机译:诸如Twitter,Face-book,Google +,MySpace,Snapchat,Instagram等社交网站的快速增长,以及诸如Weibo,Hyves等本地不变量的渗透,使其大量垃圾邮件渗透进来。活动。根据这些功能,帐户或内容可以分为垃圾邮件或良性邮件。一些不相​​关特征的存在会降低分类器的性能,数据集的可理解性,并且训练和分类的时间要求也会增加。因此,特征子集选择是机器学习机制过程中必不可少的阶段。特征子集选择的目的是从总的“ n”个特征集中选择大小为“ s”(s

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号