首页> 外文会议>International conference on future information technology >A Study on Feature Selection Methods in Chinese Spam Filtering based on Maximum Entropy Model
【24h】

A Study on Feature Selection Methods in Chinese Spam Filtering based on Maximum Entropy Model

机译:基于最大熵模型的中国垃圾邮件滤波功能选择方法研究

获取原文

摘要

Solving the Chinese spam filtering problem which is considered as a classification task is paid more and more attention nowadays. In this paper, the MEM (Maximum Entropy Model) is employed as the classifier, and the classification performance based on four different feature selection methods which are Document Frequency (DF), CHI statistics. Information Gain (IG) and Mutual Information (MI) is investigated. The results of the experiment on CCERT corpus demonstrate that DF and CHI prove to be the best and most stable feature selection method in Chinese spam filtering when MEM is applied. To our knowledge, this is the first time that the comparison of the performance of the four feature selection methods in MEM is made in Chinese spam filtering.
机译:解决中国垃圾邮件过滤问题,被认为是分类任务的越来越多地关注。在本文中,MEM(最大熵模型)用作分类器,以及基于四种不同特征选择方法的分类性能,其是文档频率(DF),CHI统计。调查信息增益(IG)和互信息(MI)。 CCERT语料库实验结果表明,当应用MEM时,DF和CHI被证明是中国垃圾邮件过滤中最佳,最稳定的特征选择方法。为了我们的知识,这是第一次比较MEM中的四个特征选择方法的性能的比较是用中式垃圾邮件过滤制作的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号