首页> 中文期刊>计算机工程与应用 >改进信息增益的维吾尔文特征选择方法

改进信息增益的维吾尔文特征选择方法

     

摘要

Feature selection is the key step of Uyghur text classification, which causes direct effect on the categorization results. To improve the effect of traditional information gain algorithm on the Uyghur feature selection, a new information gain feature selection method is proposed on the basis of deep analysis of Uyghur text feature. This method combines with word frequency in class, characteristics of the distribution coefficient and inverse document frequency, thus traditional information gain is modified. Furthermore, it introduces an alternative features of distribution coefficient to balance the selected number between the classes. Finally, experimental verification is conducted on Uyghur text dataset. The results show that modified information gain algorithm has greatly improved the effect of Uyghur text classification.%特征选择是维吾尔语文本分类的关键技术,对分类结果将产生直接的影响.为了提高传统信息增益在维吾尔文特征选择中的效果,在深度分析维吾尔文语种特点的基础上,提出了一种新的信息增益特征选择方法.该方法结合类词频和特征分布系数以及倒逆文档频率,对传统信息增益进行修正;引入一个备选特征分布系数来平衡类间选取的特征个数;在维吾尔文数据集上实验验证.实验结果表明,改进的算法对维吾尔文分类效果有明显的提高.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号