...
首页> 外文期刊>Procedia Computer Science >Evaluating the Effectiveness of Machine Learning Methods for Spam Detection
【24h】

Evaluating the Effectiveness of Machine Learning Methods for Spam Detection

机译:评估机器学习方法对垃圾邮件检测的有效性

获取原文
           

摘要

Technological advances are accelerating the dissemination of information. Today, millions of devices and their users are connected to the Internet, allowing businesses to interact with consumers regardless of geography. People all over the world send and receive emails every day. Email is an effective, simple, fast, and cheap way to communicate. It can be divided into two types of emails: spam and ham. More than half of the letters received by the user – spam. To use Email efficiently without the threat of losing personal information, you should develop a spam filtering system. The aim of this work is to reduce the amount of spam using a classifier to detect it. The most accurate spam classification can be achieved using machine learning methods. A natural language processing approach was chosen to analyze the text of an email in order to detect spam. For comparison, the following machine learning algorithms were selected: Naive Bayes, K-Nearest Neighbors, SVM, Logistic regression, Decision tree, Random forest. Training took place on a ready-made dataset. Logistic regression and NB give the highest level of accuracy – up to 99%. The results can be used to create a more intelligent spam detection classifier by combining algorithms or filtering methods.
机译:技术进步正在加速信息传播。如今,数百万设备及其用户连接到互联网,允许企业与消费者互动,无论地理。世界各地的人们每天发送和接收电子邮件。电子邮件是一种有效,简单,快速,便宜的沟通方式。它可以分为两种类型的电子邮件:垃圾邮件和火腿。用户收到的一半以上的字母 - 垃圾邮件。要有效地使用电子邮件而无需丢失个人信息的威胁,您应该开发垃圾邮件过滤系统。这项工作的目的是使用分类器来减少垃圾邮件的量来检测它。可以使用机器学习方法实现最准确的垃圾邮件分类。选择自然语言处理方法来分析电子邮件的文本以检测垃圾邮件。为了比较,选择以下机器学习算法:天真贝叶斯,k-最近邻居,SVM,逻辑回归,决策树,随机森林。培训发生在现成的数据集中。 Logistic回归和NB提供最高级别的准确性 - 高达99%。结果可用于通过组合算法或过滤方法来创建更智能的垃圾邮件检测分类器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号