首页> 外文会议>International Conference on Enterprise Information Systems >GU METRIC - A New Feature Selection Algorithm for Text Categorization

GU METRIC - A New Feature Selection Algorithm for Text Categorization

机译:古韵 - 文本分类的新特征选择算法



To improve scalability of text categorization and reduce over-fitting, it is desirable to reduce the number of words used for categorisiation. Further, it is desirable to achieve such a goal automatically without sacrificing the categorization accuracy. Such techniques are known as automatic feature selection methods. Typically this is done in the way that each word is assigned a weight (using a word scoring metric) and the top scoring words are then used to describe a document collection. There are several word scoring metrics which have been employed in literature. In this paper we present a novel feature selection method called the GU metric. The details of comparative evaluation of all the other methods are given. The results show that the GU metric outperforms some of the other well known feature selection methods.
机译:为了提高文本分类和减少过度拟合的可扩展性,希望减少用于分类的单词数。 此外,希望自动实现这样的目标而不牺牲分类精度。 这种技术称为自动特征选择方法。 通常,这是以每个单词分配重量(使用字评分度量)的方式完成,然后使用顶部评分词来描述文档集合。 文学中有几个词评分指标。 在本文中,我们提出了一种名为GU度量的新颖特征选择方法。 给出了所有其他方法的比较评价细节。 结果表明,古度量优于其他一些众所周知的特征选择方法。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号