首页> 外文会议>International Conference on Genetic and Evolutionary Computing >Term-frequency Based Feature Selection Methods for Text Categorization
【24h】

Term-frequency Based Feature Selection Methods for Text Categorization

机译:基于术语频率的特征选择方法进行文本分类

获取原文

摘要

A major difficulty of text categorization is the high dimensionality of the feature space. Feature selection is an important step in text categorization to reduce the feature space. Automatic feature selection methods such as document frequency thresholding (DF), information gain (IG), mutual information (MI), and so on are commonly applied in text categorization, but they do not use term frequency information. In this paper, we put forward improved DF, improved IG and improved MI methods which use term frequency information. Experiments show that our improved methods are seen notable improvements in the performance than the original DF, IG and MI methods.
机译:文本分类的主要难度是特征空间的高度。特征选择是文本分类的重要步骤,以减少要素空间。自动特征选择方法,如文档频率阈值(DF),信息增益(IG),相互信息(MI),等等是在文本分类中应用的,但它们不使用术语频率信息。在本文中,我们提出了改进的DF,改进的IG和​​改进的MI方法,该方法使用术语频率信息。实验表明,我们的改进方法看起来比原始DF,IG和MI方法显着改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号