Content Based Spam Detection In Short Text Messages With Emphasis On Dealing With Imbalanced Datasets

机译：基于内容的垃圾邮件检测在短篇小说中，重点处理了不平衡数据集

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Short text messages are an important means which help people connect with each other via their cell phones. The owing popularity of these messages are at times hindered by sometimes unwanted messages and advertisements also being sent via text messages which are called spams. Sometimes this behaviour can be irritating for the recipient. Automatic spam filters are being used to identify these unwanted messages and help the users to prevent those messages from getting into their inbox. The approaches to spam detection problem has been either content based or heuristic based. The proposed work puts forward a content based machine learning approach with a special emphasis on the fact that the datasets are imbalanced which is a reflection of the real world scenario with respect to spam detection. Expermentations have been performed on popular machine learning algorithms like SVM, AdaBoost, Bagging and J48 to find the classifier which is better to deal with imbalanced datasets and hence experimenting with that classifier on techniques for imbalacing. Identifying the discriminating features, application of feature reduction techniques, dealing with issues related to imbalanced datasets etc. are the major milestones in the proposed work. SMOTE technique is applied to deal with imbalanced datasets. SVM in combination with SMOTE exhibited the best performance with an improvement of 7 points in the JSC dataset and 3 points in the UCI Dataset over imbalanced datasets, the results reported in Average Class accuracy.

机译：短文本消息是一个重要的手段，帮助人们通过他们的手机相互连接。这些消息的普及有时受到有时不需要的消息和广告也受到称为垃圾邮件的文本消息的不需要的消息和广告。有时这种行为对于收件人来说可能是刺激的。自动垃圾邮件过滤器正在用于识别这些不需要的消息，并帮助用户防止这些消息进入其收件箱。垃圾邮件检测问题的方法是基于内容的或启发式基于的。所提出的工作提出基于内容的机器学习方法，特别强调数据集是不平衡的，这是对垃圾邮件检测的真实世界场景的反映。已经对SVM，Adaboost，Bagging和J48等流行的机器学习算法进行了实验，以找到更好的分类器，这些分类器更好地处理不平衡数据集，从而试验该分类器对Imbalacing的技术。识别特征，特征减少技术的应用，处理与不平衡数据集等相关的问题等是拟议工作中的主要里程碑。 SMOTE技术应用于处理不平衡数据集。 SVM与SMOTE结合表现出最佳性能，在JSC数据集中提高了7个点，并且在UCI数据集中的3分在不平衡数据集中，结果均以平均阶级准确性报告。

著录项

来源
《International Conference on Computing Communication Control and Automation》|2018年|602p|共5页
会议地点
作者
Payal Aich; Manju Venugopalan; Deepa Gupta;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN91-53;
关键词
SVM; Adaboost; Bagging; J48; SMOTE; Over-Sampling; Under-Sampling;

机译：SVM;adaboost;袋装;J48;笑容;过度抽样;在抽样下;

相似文献

外文文献
中文文献
专利

1. Resolving the imbalance issue in short messaging service spam dataset using cost-sensitive techniques [J] . Lim Lee Peng, Singh Manmeet Mahinderjit Journal of information security and applications . 2020,第Octa期

机译：使用成本敏感技术解决短消息服务垃圾数据集中的不平衡问题
2. An Improved Machine Learning-Based Short Message Service Spam Detection System [J] . Odukoya Oluwatoyin, Akinyemi Bodunde, Gooding Titus, International Journal of Computer Network and Information Security . 2019,第12期

机译：基于机器学习的短消息服务垃圾邮件检测系统
3. Contextual correlation based thread detection in short text message streams [J] . Jiuming Huang, Bin Zhou, Quanyuan Wu, Journal of Intelligent Information Systems . 2012,第2期

机译：短文本消息流中基于上下文相关的线程检测
4. Content Based Spam Detection In Short Text Messages With Emphasis On Dealing With Imbalanced Datasets [C] . Payal Aich, Manju Venugopalan, Deepa Gupta International Conference on Computing Communication Control and Automation . 2018

机译：基于内容的垃圾邮件检测在短篇小说中，重点处理了不平衡数据集
5. Topic Modeling and Spam Detection for Short Text Segments in Web Forums [D] . Sun, Yingcheng. 2020

机译：网上论坛中短文本段的主题建模和垃圾邮件检测
6. Methods system errors and demographic differences in participant errors using daily text message-based short message service computer-assisted self-interview (SMS-CASI) to measure sexual risk behavior in a RCT of HIV self-test use [O] . William Brown III, Alan Sheinfil, Javier Lopez-Rios, 2019

机译：使用每日基于短信的短信服务计算机辅助自我访谈（SMS-CASI）来测量HIV自测使用的RCT中的性风险行为的方法系统错误和参与者错误的人口统计学差异
7. Memetic algorithm for short messaging service spam filter using text normalization and semantic approach [O] . Arnold Adimabua Ojugo, Andrew Okonji Eboka 2020

机译：使用文本归一化和语义方法的短消息传递服务垃圾邮件滤波器的膜算法

Content Based Spam Detection In Short Text Messages With Emphasis On Dealing With Imbalanced Datasets

摘要

著录项

相似文献

相关主题

期刊订阅