首页> 中文期刊> 《华东交通大学学报》 >基于套索(Lasso)的中文垃圾邮件过滤

基于套索(Lasso)的中文垃圾邮件过滤

         

摘要

Text email data depicted with vector space model are of high dimensionality and sparsity, which are not suitable for establishing email filtering classification model. Generally, such data should be reduced before classifi-er training. Lasso regression is a multivariate linear model based on l1 regularization, which can estimate model pa-rameters while selecting the variables simultaneously. In this paper, the approaches to email classification based on Lasso are proposed. Also, the Lasso classification model and the logistical model with the selected term are es-tablished. Besides, simulation experiments with TREC06C are carried out, and the results show that logistic regres-sion model plus the term selected with Lasso achieves better performances.%使用向量空间模型表示的文本邮件数据高维而稀疏,不利于邮件过滤分类模型的建立,通常需在分类器训练前进行维数约减。Lasso回归是一种基于l1正则化的多元线性模型,其在模型参数估计的同时实现了变量选择。提出使用Lasso回归进行垃圾邮件过滤,建立Lasso回归邮件分类模型、Lasso回归词条选择结合逻辑回归的分类模型,结合中文文本垃圾邮件数据集TREC06C进行垃圾邮件过滤实验。实验结果表明Lasso回归词条选择结合逻辑回归的邮件分类模型性能更佳。

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号