首页> 外文会议>The 2nd International Conference on Software Engineering and Data Mining >Complete Gini-Index Text (GIT) feature-selection algorithm for text classification
【24h】

Complete Gini-Index Text (GIT) feature-selection algorithm for text classification

机译:用于文本分类的完整Gini-Index Text(GIT)功能选择算法

获取原文

摘要

The recently introduced Gini-Index Text (GIT) feature-selection algorithm for text classification, through incorporating an improved Gini Index for better feature-selection performance, has some drawbacks. Specifically, the algorithm, under real-world experimental conditions, concentrates feature values to one point and be inadequate for selecting representative features. As such, good representative features cannot be estimated, and neither, moreover, can good performance be achieved in unbalanced text classification. Therefore, we suggest a new complete GIT feature-selection algorithm for text classification. The new algorithm, according to experimental results, could obtain unbiased feature values, and could eliminate many irrelevant and redundant features from feature subsets while retaining many representative features. Furthermore, the new algorithm, compared with the original version, demonstrated a notably improved overall classification performance.
机译:最近引入的用于文本分类的Gini索引文本(GIT)特征选择算法,通过合并改进的Gini索引以获得更好的特征选择性能,存在一些缺陷。具体而言,该算法在实际实验条件下会将特征值集中到一个点,并且不足以选择代表性特征。因此,无法估计良好的代表特征,而且在不平衡的文本分类中也无法获得良好的性能。因此,我们建议一种新的完整的GIT特征选择算法进行文本分类。根据实验结果,该新算法可以获得无偏特征值,并且可以在保留许多代表性特征的同时,从特征子集中消除许多不相关和冗余的特征。此外,与原始版本相比,新算法显示出明显改善的整体分类性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号