首页> 外文会议>International Conference on Intelligent Human-Machine Systems and Cybernetics >An Improved Mutual Information-Based Feature Selection Algorithm for Text Classification
【24h】

An Improved Mutual Information-Based Feature Selection Algorithm for Text Classification

机译:一种改进的基于互信息的文本分类特征选择算法

获取原文

摘要

Feature selection plays an important role in text classification, and contributes directly to the accuracy of the classification. In order to correct the defects, such as mutual information-Based feature selection method tends to select rare words and those words from small samples as features, and negative MI value. This paper proposes a new improved feature evaluation function for automatic text classification by taking word frequency, concentration rate between classes and dispersion within class into overall consideration. According to experimental results, the improved algorithm is well placed to remedy the defect that the original MI evaluation function is prone to select rare words, and can improve the performance of classification significantly.
机译:特征选择在文本分类中起着重要作用,并且直接有助于分类的准确性。为了纠正这些缺陷,诸如基于互信息的特征选择方法倾向于从稀疏样本中选择稀有词和那些词作为特征,并选择负的MI值。通过综合考虑词频,词类集中度和类内离散度,提出了一种新的改进的自动文本分类特征评估功能。根据实验结果,改进后的算法可以很好地弥补原来的MI评价函数易于选择稀有词的缺陷,并能显着提高分类的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号