首页> 外文会议>Fourth International Conference on Genetic and Evolutionary Computing >Term-frequency Based Feature Selection Methods for Text Categorization
【24h】

Term-frequency Based Feature Selection Methods for Text Categorization

机译:基于术语频率的文本分类特征选择方法

获取原文

摘要

A major difficulty of text categorization is the high dimensionality of the feature space. Feature selection is an important step in text categorization to reduce the feature space. Automatic feature selection methods such as document frequency thresholding (DF), information gain (IG), mutual information (MI), and so on are commonly applied in text categorization, but they do not use term frequency information. In this paper, we put forward improved DF, improved IG and improved MI methods which use term frequency information. Experiments show that our improved methods are seen notable improvements in the performance than the original DF, IG and MI methods.
机译:文本分类的主要困难是特征空间的高维性。特征选择是文本分类中减少特征空间的重要步骤。自动特征选择方法,例如文档频率阈值(DF),信息增益(IG),互信息(MI)等,通常用于文本分类,但是它们不使用术语频率信息。在本文中,我们提出了使用项频信息的改进DF,改进IG和改进MI方法。实验表明,与原始的DF,IG和MI方法相比,我们的改进方法在性能上有显着提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号