首页> 中文期刊>计算机工程 >一种基于方差的文本特征选择算法




中文文本分类中传统特征选择算法在低维情况下分类效果不佳.为此,提出一种结合方差思想的评估函数,选出具有较强类别信息的词条,在保证整体分类性能不下降的同时,提高稀有类别的分类精度.采用中心向量分类器,在TanCorpV 1.0语料上进行实验,结果表明,该方法在低维空间优势明显,与常用的文档频率、信息增益等9种特征选择算法相比,宏平均值均有较大提高.%The effectiveness of traditional feature selection method is not good when feature dimension is low. A new method based on variance is proposed to solve this problem. This approach can select class information words in order to maintain categorization accuracy and improve the performance of rare classes. This paper gives a comparative analysis between the new method and other traditional feature selection methods such as Document Frequency(DF), Information Gain(IG), Mutual Information(MI), Chi-square Statistics(CHl), etc. Experiment takes Rocchio as the evaluation classifier. Experimental results on TanCorpVl.O corpora show that the new feature selection Variance Feature Selection Method(VFSM) outperforms the traditional ones when using macro-averaged-measures Fl



  • 中文文献
  • 外文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号