首页> 外国专利> spam classification model generating apparatus and method, and program

spam classification model generating apparatus and method, and program

机译:垃圾邮件分类模型生成装置,方法和程序

摘要

PROBLEM TO BE SOLVED: To appropriately generate a spam determination model even as to practice data having a case example biased.;SOLUTION: A spam classification model generation device according to the present invention is configured to: calculate, when the case example having a spam label of any of a spam page (spam) and a non-spam page (ham) and a characteristic vector thereof is input, a loss of the case example in accordance with a size of a margin set for each class; store a set of a value of the loss and case example information in a spam label array or a ham label array in accordance with the spam label; eliminate a case example having a maximal loss from the label array; extract a characteristic vector of the case example having the maximal loss; update a weighted-vector by use of the characteristic vector; calculate a spam score by use of an updated weighted-vector and a characteristic vector stored in classification data storage means; determine the case example to be the spam when the spam score is a predetermined threshold value or more, and determine the case to be the ham when the spam score is less than the threshold value; and output a determination result.;COPYRIGHT: (C)2014,JPO&INPIT
机译:解决的问题:甚至在实践具有偏见的案例的数据时也适当地生成垃圾邮件确定模型。解决方案:根据本发明的垃圾邮件分类模型生成设备被配置为:计算具有垃圾邮件的案例时输入垃圾邮件页面(spam)和非垃圾邮件页面(ham)中的任何一个的标签及其特征向量,根据为每个类别设置的页边空白的大小,损失案例示例;根据垃圾邮件标签,将一组损失值和案例示例信息存储在垃圾邮件标签阵列或火腿标签阵列中;从标签数组中消除损失最大的案例;提取损失最大的案例的特征向量;利用特征向量更新加权向量;通过使用更新的加权矢量和存储在分类数据存储装置中的特征矢量来计算垃圾邮件得分;当垃圾邮件分数为预定阈值或更大时,确定为垃圾邮件;当垃圾邮件分数小于阈值时,为垃圾邮件。并输出确定结果。;版权:(C)2014,JPO&INPIT

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号