首页> 外文会议>International Conference on Big Data Analytics >PRISMO: Priority Based Spam Detection Using Multi Optimization
【24h】

PRISMO: Priority Based Spam Detection Using Multi Optimization

机译:PRISMO:基于优先级的垃圾邮件检测使用多优化

获取原文

摘要

The rapid growth of social networking sites such as Twitter, Facebook, Google+, MySpace, Snapchat, Instagram, etc., along with its local invariants such as Weibo, Hyves, etc., has made them infiltrated with a large amount of spamming activities. Based on the features, an account or content can be classified as spam or benign. The presence of some irrelevant features decreases the performance of the classifier, understandability of dataset, and the time requirement for training and classification increases. Therefore, Feature subset selection is an essential phase in the process of machine learning mechanism. The objective of feature subset selection is to choose a subset of size 's' (s < n) from the total set of 'n' features that results in the least classification error. The feature subset selection problem can be represented as a problem of optimization in which the objective is to choose the near-optimal subset of features. Based on the literature survey, it is found that the classifier will offer its best performance if the data with high dimension is reduced such that it includes only appropriate features having lesser redundancy. The contribution of this paper comprises feature subset and its cost optimization simultaneously. The fundamental aspect PRISMO is to generate a primary feature subset through various optimization algorithms for the initialization stage. Further, the subset has been generated using the initial feature set based on their priority using basic rules of conjunction and disjunction. To evaluate the overall efficiency of PRISMO, various experiments were carried out using different dataset. The obtained result shows that the proposed model effectively reduces the cardinality of features without any bias to a specific dataset and any degradation to the classifier accurateness.
机译:社交网站(如Twitter,Facebook,Google+),Snapchat,Instagram等的快速增长以及其当地的不变性,如微博,HUVE等,使其渗透到大量的垃圾邮件活动。基于特征,帐户或内容可以被归类为垃圾邮件或良性。一些无关的特征的存在降低了分类器的性能,数据集的可理解性,以及训练和分类的时间要求增加。因此,特征子集选择是机器学习机制过程中的基本阶段。特征子集选择的目的是从总组的“N”特征集中选择大小的S'(S

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号