首页> 外文期刊>Journal of software >Application of Linear Classifier on Chinese Spam Filtering
【24h】

Application of Linear Classifier on Chinese Spam Filtering

机译:线性分类器在中文垃圾邮件过滤中的应用

获取原文
获取原文并翻译 | 示例
       

摘要

Spam is a key problem in electronic communication. Especially in large-scale email systems. Content-based filtering is one mainstream method of combating this threat in its forms, an e-mail filtering system can learn directly from a user's mail set, but the previous Content-based filtering methods are hard to find a balance between efficiency and effectiveness. Such algorithms of text categorization as Naive Bayes, kNN, Decision Tree and Boosting can be applied in spam filtering. However, the effectiveness of Naive Bayes is limited and it is not fit for instant feedback learning. Others algorithm such as SVM are more effective but complicated to compute. Because in a real email system a large volume of emails often need to be handled in a short time, efficiency will often be as important as effectiveness when implementing an anti-spam filtering method. So we intend to find a linear classifier to solve this problem, two online linear classifiers: the Perception and Winnow were explored for this task, which are two fast linear classifiers. The training of these two methods is online and mistake driven. Furthermore, they are suitable for feedback. We employ the two methods in three benchmark corpora, including PU1, Ling spam and 2005-Jun, the experiments in public e-mail corpus show an effective result. We conclude that the two online linear classifiers have a state-of-the-art performance for filtering spam, especially for Chinese spam emails.
机译:垃圾邮件是电子通信中的关键问题。特别是在大型电子邮件系统中。基于内容的过滤是一种抗击这种形式威胁的主流方法,电子邮件过滤系统可以直接从用户的邮件集中学习,但是以前的基于内容的过滤方法很难在效率和有效性之间找到平衡。诸如Naive Bayes,kNN,决策树和Boosting之类的文本分类算法可以应用于垃圾邮件过滤。但是,朴素贝叶斯的有效性是有限的,它不适合即时反馈学习。其他算法(例如SVM)更有效,但计算复杂。因为在实际的电子邮件系统中,经常需要在短时间内处理大量电子邮件,所以在实施反垃圾邮件过滤方法时,效率通常与有效性同等重要。因此,我们打算找到一个线性分类器来解决此问题,为此任务探索了两个在线线性分类器:Perception和Winnow,它们是两个快速线性分类器。这两种方法的培训是在线的并且是错误驱动的。此外,它们适合反馈。我们在PU1,Ling垃圾邮件和2005年6月这三个基准语料库中采用了这两种方法,在公共电子邮件语料库中进行的实验显示了有效的结果。我们得出的结论是,这两个在线线性分类器在过滤垃圾邮件(尤其是中文垃圾邮件)方面具有最先进的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号