首页> 外文期刊>Journal of information and computational science >A Bayesian Topic Model for Spam Filtering
【24h】

A Bayesian Topic Model for Spam Filtering

机译:用于垃圾邮件过滤的贝叶斯主题模型

获取原文
获取原文并翻译 | 示例

摘要

Spam is one of the major problems of today's Internet because it brings financial damage to companies and annoys individual users. Among those approaches developed to detect spam, the content-based machine learning algorithms are important and popular. However, these algorithms are trained using statistical representations of the terms that usually appear in the e-mails. Additionally, these methods are unable to account for the underlying semantics of terms within the messages. In this paper, we present a Bayesian topic model to address these limitations. We explore the use of semantics in spam filtering by representing e-mails as vectors of topics with a topic model: the Latent Dirichlet Allocation (LDA). Based upon this representation, the relationship between the topics and spam can be discovered by using a Bayesian method. We test this model on the Enron-Spam datasets and results show that the proposed model performs better than the baseline and can detect the internal semantics of spam messages.
机译:垃圾邮件是当今Internet的主要问题之一,因为它给公司带来财务损失并惹恼个人用户。在开发出的用于检测垃圾邮件的方法中,基于内容的机器学习算法非常重要且受欢迎。但是,使用通常出现在电子邮件中的术语的统计表示来训练这些算法。此外,这些方法无法解释消息中术语的基本语义。在本文中,我们提出了一种贝叶斯主题模型来解决这些限制。通过使用主题模型:潜在狄利克雷分配(LDA),将电子邮件表示为主题的向量,我们探索了语义在垃圾邮件过滤中的使用。基于此表示,可以使用贝叶斯方法发现主题和垃圾邮件之间的关系。我们在Enron-Spam数据集上测试了该模型,结果表明,该模型的性能优于基线,并且可以检测到垃圾邮件的内部语义。

著录项

  • 来源
    《Journal of information and computational science》 |2013年第12期|3719-3727|共9页
  • 作者单位

    School of Computer and Information Science, Southwest University, Chongqing 400715, China;

    School of Computer and Information Science, Southwest University, Chongqing 400715, China;

    School of Computer and Information Science, Southwest University, Chongqing 400715, China;

    School of Computer and Information Science, Southwest University, Chongqing 400715, China;

    School of Computer and Information Science, Southwest University, Chongqing 400715, China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Spam Detection; Latent Dirichlet Allocation; Bayesian Topic Model;

    机译:垃圾邮件检测;潜在狄利克雷分配贝叶斯主题模型;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号