首页> 外国专利> Automated generation of spam-detection rules using optical character recognition and identifications of common features

Automated generation of spam-detection rules using optical character recognition and identifications of common features

机译:使用光学字符识别和常见特征识别自动生成垃圾邮件检测规则

摘要

In a spam detection method and system, optical character recognition (OCR) techniques are applied to a set of images that have been identified as being spam. The images may be provided as the initial training of the spam detection system, but the preferred embodiment is one in which the images are provided for the purpose of updating the spam-detection rules of currently running systems at different locations. The OCR generates text strings representative of content of the individual images. Automated techniques are applied to the text strings to identify common features or patterns, such as misspellings which are either intentionally included in order to avoid detection or introduced through OCR errors due to the text being obscured. Spam-detection rules are automatically generated on the basis of identifications of the common features. Then, the spam-detection rules are applied to electronic communications, such as electronic mail, so as to detect occurrences of spam within the electronic communications.
机译:在垃圾邮件检测方法和系统中,光学字符识别(OCR)技术应用于已被识别为垃圾邮件的一组图像。可以提供图像作为垃圾邮件检测系统的初始训练,但是优选实施例是其中提供图像以更新不同位置处的当前运行系统的垃圾邮件检测规则的图像。 OCR生成代表各个图像内容的文本字符串。自动化技术应用于文本字符串,以识别常见的特征或模式,例如故意拼写错误,以便避免检测或由于文本被遮盖而通过OCR错误引入。垃圾邮件检测规则是在识别共同特征的基础上自动生成的。然后,将垃圾邮件检测规则应用于诸如电子邮件的电子通信,以便检测电子通信内垃圾邮件的出现。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号