【24h】

An Online Linear Chinese Spam Emails Filtering System

机译:在线线性中文垃圾邮件过滤系统

获取原文
获取外文期刊封面目录资料

摘要

Spam is a key problem in electronic communication. The increasing volume of spam has become a serious threat not only to the Internet, but also to society. Content-based filtering is one mainstream method of combating this threat in its various forms, but the previous Content-based filtering methods are hard to find a balance between efficiency and effectiveness. In this paper we intend to seek a linear solve for this problem, and two online linear classifiers: the Perceptron and Winnow are explored for this task in three benchmark corpora, which include English corpus PU1, Lingspam and Chinese corpus 2005-Jun, Our experiments conclude that both of these classifiers can filter spam emails effectively as well as efficiently. It is also show that they perform much better than a standard Naïve Bayes method. In fact, to the best of our knowledge, they have a state-of-the-art performance for filtering Chinese spam emails, at least on the above corpora. Furthermore, both of the two classifiers are easily adaptively updated, thus are suitable for real dynamic environment.
机译:垃圾邮件是电子通信中的关键问题。垃圾邮件数量的增加不仅对互联网而且对社会都构成严重威胁。基于内容的过滤是一种以各种形式应对这种威胁的主流方法,但是以前的基于内容的过滤方法很难在效率和效果之间找到平衡。在本文中,我们打算寻求线性解决方案,并在三个基准语料库中探索了两个在线线性分类器:Perceptron和Winnow来完成此任务,其中包括英语语料库PU1,Lingspam和中文语料库2005年6月,我们的实验结论是,这两个分类器都可以有效地过滤垃圾邮件。这也表明它们的性能比标准朴素贝叶斯方法要好得多。实际上,据我们所知,它们至少在上述语料库上具有过滤中国垃圾邮件的最先进性能。此外,两个分类器都容易适应性地更新,因此适用于真实的动态环境。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号