首页> 外国专利> SYSTEM AND METHOD FOR DETERMINING SPAM-CONTAINING MESSAGE BY TOPIC OF MESSAGE SENT VIA E-MAIL

SYSTEM AND METHOD FOR DETERMINING SPAM-CONTAINING MESSAGE BY TOPIC OF MESSAGE SENT VIA E-MAIL

机译:通过电子邮件发送的邮件主题确定包含垃圾邮件的系统和方法

摘要

FIELD: information technology.;SUBSTANCE: method for detecting spam in a message sent via e-mail is disclosed, wherein: a) by means a message processing means the message sent via e-mail is received and where the message header contains a message in the form of a text which comprises more than three words; b) the message processing means determines the text parameters of the message subject, where the text parameters of the message topic are at least one of: a language on which the text of message subject is written, the number of words in the text of the message subject, the number of the articles in the text of the message subject, the number of punctuation characters in the text of the message subject, the number of pronouns in the text of the message subject, the number of prepositions in the text of the message subject; b) by means of a coefficient determining means, k and n coefficients are determined for constructing k-skip-and-gram of word combinations based on text parameters of the message subject by rules defining the coefficients; g) using the coefficient determining means, a set of k-skip-n-gram of the word combinations from the text of the message subject using certain values of k and n coefficients; d) using a vector construction means, the vector is constructed to calculate the degree of cosine similarity for each k-skip-n-gram of the word combination from the generated set; e) using the vector construction means for each constructed vector, the degree of cosine similarity with known vectors from the vector database is calculated; g) using a spam detection means, a theme category of the message is determined based on a plurality of calculated degrees of cosine similarity with known vectors; h) by means of a spam detection means, the current value of the spam coefficient is calculated based on the plurality of counted degrees of cosine similarity of all constructed vectors; and i) by means of the spam detection means, when a certain threshold value of the spam coefficient is exceeded, the spam in the received message is detected.;EFFECT: spam detection in the message sent via e-mail.;2 cl, 5 dwg
机译:公开了一种用于检测通过电子邮件发送的消息中的垃圾邮件的方法,其中:a)通过消息处理装置,接收通过电子邮件发送的消息,并且消息头包含消息以包含三个以上单词的文本形式; b)消息处理装置确定消息主题的文本参数,其中消息主题的文本参数是以下至少之一:写有消息主题文本的语言,语言主题中的单词数邮件主题,邮件主题文本中的文章数,邮件主题文本中的标点符号数量,邮件主题文本中的代词数量,介词中的介词数量邮件主题; b)借助于系数确定装置,确定k和n个系数,用于根据消息主题的文本参数和定义系数的规则来构造k个跳跃词和克-词组合。 g)使用系数确定装置,使用k和n系数的某些值,从消息主题的文本中提取一组k个跳过n元的单词组合; d)使用向量构造装置,构造向量以从生成的集合中计算单词组合的每个k-跳过-n-gram的余弦相似度; e)对每个构造的矢量使用矢量构造装置,计算与矢量数据库中已知矢量的余弦相似度; g)使用垃圾邮件检测装置,基于多个计算出的与已知向量的余弦相似度来确定消息的主题类别; h)通过垃圾邮件检测装置,基于所有构造矢量的多个余弦相似度计算出垃圾邮件系数的当前值;并且i)通过垃圾邮件检测装置,当超过垃圾邮件系数的某个阈值时,检测到接收到的邮件中的垃圾邮件。效果:通过电子邮件发送的邮件中的垃圾邮件检测。2 cl, 5载重吨

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号