首页> 外文会议>IEEE International Conference on Multimedia Big Data >Fusing Gini Index and Term Frequency for Text Feature Selection
【24h】

Fusing Gini Index and Term Frequency for Text Feature Selection

机译:熔断基尼指数和文本特征选择的术语频率

获取原文

摘要

Automatic text classification is the key technology to process and organize large-scale text data. It is well known that the high dimensionality of feature space is a main challenge for text classification. In order to attenuate such a problem as well as inspired by existing arts, we propose an effective text feature selection algorithm by novelly fusing the classical methodologies of Gini index and term frequency (TF), which is named as Gini-TF. Specifically, the involved Gini-TF function is wisely constructed by combining the Gini index text feature selection based on purity and the prior typical term frequency-inverse document frequency (TF-IDF) methods. Such a computation-efficient fusion would be beneficial for improving the efficacy of text feature selection. Experimental results show that our proposed Gini-TF fused algorithm could efficiently reduce the dimension of text feature space and improve the accuracy of text classification comparing with some prior classical methods.
机译:自动文本分类是处理和组织大规模文本数据的关键技术。众所周知,特征空间的高维度是文本分类的主要挑战。为了衰减此类问题以及由现有艺术的启发,我们通过新颖的融合了基尼索引和术语频率(TF)的经典方法提出了有效的文本特征选择算法,该频率(TF)被命名为Gini-TF。具体地,通过基于纯度和先前的典型术语频率 - 逆文档频率(TF-IDF)方法组合基尼索引文本特征选择,明智地构建所涉及的GINI-TF功能。这种计算有效的融合对于提高文本特征选择的功效是有益的。实验结果表明,我们提出的Gini-TF融合算法可以有效地降低文本特征空间的维度,提高与一些现有经典方法比较文本分类的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号