首页> 外文期刊>Expert Systems with Application >Improved inverse gravity moment term weighting for text classification
【24h】

Improved inverse gravity moment term weighting for text classification

机译:改进的逆重力矩项加权用于文本分类

获取原文
获取原文并翻译 | 示例

摘要

Text classification is one of the popular high dimensional classification problems where providing better feature vector representations explicitly improve classification performances. Thus, assigning appropriate weights to features or terms are crucial for obtaining effective feature vector representations. The methods used for weighting terms in text classification are called term weighting schemes. Although there exist some term weighting schemes for text classification, they are not fully effective and researchers still focus on proposing new term weighting schemes. In this study, two novel term weighting schemes namely SQRT_TF-IGM(imp) and TF-IGM(imp) derived from standard inverse gravity moment formula are proposed to improve weighting behaviors of existing TF-IGM scheme especially for some extreme cases. The performances of proposed schemes are compared with two standard IGM based schemes and five other state-of-the-art term weighting methods on both unbalanced (Reuters-21578) and balanced (20 Mini Newsgroups and 20 Newsgroups) datasets with KNN, SVM, and NN classifiers. Micro-F1 and macro-F1 are used as success measures. The experiments are conducted with various different feature sizes to examine the effects of the feature size on the success of weighting. The experimental results showed that the proposed SQRT_TF-IGM(imp) method generally outperformed all schemes including both standard TF-IGM and SQRT_TF-IGM schemes. However, the proposed TF-IGMimp scheme also showed mostly better performance than standard TF-IGM. To demonstrate validity of the proposed weighting scheme having maximum performance, t-test is also used and it can be stated that the performance gains obtained by the proposed SQRT_TF-IGM(imp) weighting scheme compared to standard SQRT_TF-IGM are statistically significant. (C) 2019 Elsevier Ltd. All rights reserved.
机译:文本分类是流行的高维分类问题之一,其中提供更好的特征向量表示可以显着提高分类性能。因此,为特征或术语分配适当的权重对于获得有效的特征向量表示至关重要。在文本分类中用于加权术语的方法称为术语加权方案。尽管存在一些用于文本分类的术语加权方案,但它们并不完全有效,研究人员仍将重点放在提出新的术语加权方案上。在这项研究中,提出了两种新颖的术语加权方案,即从标准反重力矩公式导出的SQRT_TF-IGM(imp)和TF-IGM(imp),以改善现有TF-IGM方案的加权行为,特别是在某些极端情况下。将拟议方案的性能与两种基于IGM的标准方案以及其他五种最新的术语加权方法进行了比较,这两种方法均采用KNN,SVM,不平衡(Reuters 21578)和平衡(20个迷你新闻组和20个新闻组)数据集,和NN分类器。 Micro F1和macro F1被用作成功措施。用各种不同的特征尺寸进行实验,以检验特征尺寸对加权成功的影响。实验结果表明,所提出的SQRT_TF-IGM(imp)方法总体上胜过所有方案,包括标准TF-IGM和SQRT_TF-IGM方案。但是,建议的TF-IGMimp方案也显示出比标准TF-IGM更好的性能。为了证明所提出的具有最大性能的加权方案的有效性,还使用了t检验,并且可以说与标准SQRT_TF-IGM相比,所提出的SQRT_TF-IGM(imp)加权方案所获得的性能增益具有统计学意义。 (C)2019 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号