首页> 外文会议>International Conference on Multimedia Big Data >Fusing Gini Index and Term Frequency for Text Feature Selection
【24h】

Fusing Gini Index and Term Frequency for Text Feature Selection

机译:融合基尼系数和术语频率以选择文本特征

获取原文

摘要

Automatic text classification is the key technology to process and organize large-scale text data. It is well known that the high dimensionality of feature space is a main challenge for text classification. In order to attenuate such a problem as well as inspired by existing arts, we propose an effective text feature selection algorithm by novelly fusing the classical methodologies of Gini index and term frequency (TF), which is named as Gini-TF. Specifically, the involved Gini-TF function is wisely constructed by combining the Gini index text feature selection based on purity and the prior typical term frequency-inverse document frequency (TF-IDF) methods. Such a computation-efficient fusion would be beneficial for improving the efficacy of text feature selection. Experimental results show that our proposed Gini-TF fused algorithm could efficiently reduce the dimension of text feature space and improve the accuracy of text classification comparing with some prior classical methods.
机译:自动文本分类是处理和组织大规模文本数据的关键技术。众所周知,特征空间的高维性是文本分类的主要挑战。为了缓解这种问题并受到现有技术的启发,我们提出了一种有效的文本特征选择算法,该算法通过新颖地融合了经典的基尼索引和词频(TF)方法(称为Gini-TF)来进行选择。具体地,通过结合基于纯度的吉尼索引文本特征选择和先前的典型术语频逆文档频率(TF-IDF)方法来明智地构造所涉及的吉尼-TF函数。这种计算有效的融合将有利于提高文本特征选择的效率。实验结果表明,与现有的经典方法相比,本文提出的Gini-TF融合算法可以有效地减小文本特征空间的维数,提高文本分类的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号