首页> 外文会议>Asian Language Processing, 2009. IALP '09 >The Improved Logistic Regression Models for Spam Filtering
【24h】

The Improved Logistic Regression Models for Spam Filtering

机译:用于垃圾邮件过滤的改进Logistic回归模型

获取原文

摘要

The logistic regression model has achieved success in spam filtering. But it is disadvantaged by the equal adjustment of the feature weights appeared in both spam messages and ham ones during training period. This paper presents an improved logistic regression model which reduces the impact of the features appearing in both spam messages and ham ones. Byte level n-grams are employed to extract the features from messages, and TONE (Train On or Near Error) is adopted, which are proved effective in state-of-the-art spam filtering system. The official runs of CEAS (Conference on Email and Anti-Spam) Spam-filter Challenge 2008 show that the proposed model is one of the best methods. Our system achieved competitive results in all tasks and is the winner of active learning on the live stream by 1- ROCA.
机译:逻辑回归模型在垃圾邮件过滤方面取得了成功。但这是不利的,因为在培训期间,垃圾邮件和火腿邮件中特征权重的均等调整是不利的。本文提出了一种改进的逻辑回归模型,该模型可减少垃圾邮件和垃圾邮件中出现的功能的影响。字节级n-gram用于从邮件中提取特征,并采用TONE(Train On或Near Error),在最新的垃圾邮件过滤系统中被证明是有效的。 CEAS(电子邮件和反垃圾邮件会议)垃圾邮件过滤器挑战赛2008的官方运行表明,该模型是最好的方法之一。我们的系统在所有任务上均取得了竞争性成绩,并且是1- ROCA实时直播学习的赢家。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号