首页> 外国专利> SPAM FILTERING BASED ON STATISTICS AND TOKEN FREQUENCY MODELING

SPAM FILTERING BASED ON STATISTICS AND TOKEN FREQUENCY MODELING

机译:基于统计和令牌频率建模的垃圾邮件过滤

摘要

Embodiments are directed towards classifying messages as spam using a two phased approach. The first phase employs a statistical classifier to classify messages based on message content. The second phase targets specific message types to capture dynamic characteristics of the messages and identify spam messages using a token frequency based approach. A client component receives messages and sends them to the statistical classifier, which determines a probability that a message belongs to a particular type of class. The statistical classifier further provides other information about a message, including, a token list, and token thresholds. The message class, token list, and thresholds are provided to the second phase where a number of spam tokens in a given message for a given message class are determined. Based on the threshold, the client component then determines whether the message is spam or non-spam.
机译:实施例针对使用两阶段方法将消息分类为垃圾邮件。第一阶段采用统计分类器基于消息内容对消息进行分类。第二阶段针对特定的邮件类型,以捕获邮件的动态特征并使用基于令牌频率的方法识别垃圾邮件。客户端组件接收消息并将其发送到统计分类器,该统计分类器确定消息属于特定类型的类的概率。统计分类器还提供关于消息的其他信息,包括令牌列表和令牌阈值。将消息类别,令牌列表和阈值提供给第二阶段,在此确定给定消息类别中给定消息中的垃圾邮件令牌数量。然后,基于阈值,客户端组件确定邮件是垃圾邮件还是非垃圾邮件。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号