首页>
外国专利>
SYSTEM AND METHOD FOR DETERMINING SPAM-CONTAINING MESSAGE BY TOPIC OF MESSAGE SENT VIA E-MAIL
SYSTEM AND METHOD FOR DETERMINING SPAM-CONTAINING MESSAGE BY TOPIC OF MESSAGE SENT VIA E-MAIL
展开▼
机译:通过电子邮件发送的邮件主题确定包含垃圾邮件的系统和方法
展开▼
页面导航
摘要
著录项
相似文献
摘要
FIELD: information technology.;SUBSTANCE: method for detecting spam in a message sent via e-mail is disclosed, wherein: a) by means a message processing means the message sent via e-mail is received and where the message header contains a message in the form of a text which comprises more than three words; b) the message processing means determines the text parameters of the message subject, where the text parameters of the message topic are at least one of: a language on which the text of message subject is written, the number of words in the text of the message subject, the number of the articles in the text of the message subject, the number of punctuation characters in the text of the message subject, the number of pronouns in the text of the message subject, the number of prepositions in the text of the message subject; b) by means of a coefficient determining means, k and n coefficients are determined for constructing k-skip-and-gram of word combinations based on text parameters of the message subject by rules defining the coefficients; g) using the coefficient determining means, a set of k-skip-n-gram of the word combinations from the text of the message subject using certain values of k and n coefficients; d) using a vector construction means, the vector is constructed to calculate the degree of cosine similarity for each k-skip-n-gram of the word combination from the generated set; e) using the vector construction means for each constructed vector, the degree of cosine similarity with known vectors from the vector database is calculated; g) using a spam detection means, a theme category of the message is determined based on a plurality of calculated degrees of cosine similarity with known vectors; h) by means of a spam detection means, the current value of the spam coefficient is calculated based on the plurality of counted degrees of cosine similarity of all constructed vectors; and i) by means of the spam detection means, when a certain threshold value of the spam coefficient is exceeded, the spam in the received message is detected.;EFFECT: spam detection in the message sent via e-mail.;2 cl, 5 dwg
展开▼