首页> 外文会议>International Conference on Computing Communication Control and Automation >Content Based Spam Detection In Short Text Messages With Emphasis On Dealing With Imbalanced Datasets
【24h】

Content Based Spam Detection In Short Text Messages With Emphasis On Dealing With Imbalanced Datasets

机译:基于内容的垃圾邮件检测在短篇小说中,重点处理了不平衡数据集

获取原文

摘要

Short text messages are an important means which help people connect with each other via their cell phones. The owing popularity of these messages are at times hindered by sometimes unwanted messages and advertisements also being sent via text messages which are called spams. Sometimes this behaviour can be irritating for the recipient. Automatic spam filters are being used to identify these unwanted messages and help the users to prevent those messages from getting into their inbox. The approaches to spam detection problem has been either content based or heuristic based. The proposed work puts forward a content based machine learning approach with a special emphasis on the fact that the datasets are imbalanced which is a reflection of the real world scenario with respect to spam detection. Expermentations have been performed on popular machine learning algorithms like SVM, AdaBoost, Bagging and J48 to find the classifier which is better to deal with imbalanced datasets and hence experimenting with that classifier on techniques for imbalacing. Identifying the discriminating features, application of feature reduction techniques, dealing with issues related to imbalanced datasets etc. are the major milestones in the proposed work. SMOTE technique is applied to deal with imbalanced datasets. SVM in combination with SMOTE exhibited the best performance with an improvement of 7 points in the JSC dataset and 3 points in the UCI Dataset over imbalanced datasets, the results reported in Average Class accuracy.
机译:短文本消息是一个重要的手段,帮助人们通过他们的手机相互连接。这些消息的普及有时受到有时不需要的消息和广告也受到称为垃圾邮件的文本消息的不需要的消息和广告。有时这种行为对于收件人来说可能是刺激的。自动垃圾邮件过滤器正在用于识别这些不需要的消息,并帮助用户防止这些消息进入其收件箱。垃圾邮件检测问题的方法是基于内容的或启发式基于的。所提出的工作提出基于内容的机器学习方法,特别强调数据集是不平衡的,这是对垃圾邮件检测的真实世界场景的反映。已经对SVM,Adaboost,Bagging和J48等流行的机器学习算法进行了实验,以找到更好的分类器,这些分类器更好地处理不平衡数据集,从而试验该分类器对Imbalacing的技术。识别特征,特征减少技术的应用,处理与不平衡数据集等相关的问题等是拟议工作中的主要里程碑。 SMOTE技术应用于处理不平衡数据集。 SVM与SMOTE结合表现出最佳性能,在JSC数据集中提高了7个点,并且在UCI数据集中的3分在不平衡数据集中,结果均以平均阶级准确性报告。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号