首页> 外文期刊>Applied Artificial Intelligence >A NOVEL EMBEDDED FEATURE SELECTION METHOD: A COMPARATIVE STUDY IN THE APPLICATION OF TEXT CATEGORIZATION
【24h】

A NOVEL EMBEDDED FEATURE SELECTION METHOD: A COMPARATIVE STUDY IN THE APPLICATION OF TEXT CATEGORIZATION

机译:新型嵌入式特征选择方法:文本分类应用的比较研究

获取原文
获取原文并翻译 | 示例

摘要

In text classification based on a vector space model, the high dimension of the feature may pose some problems. These problems occur not only for computational reasons, but also because of over-fitting. Feature selection is an important preprocessing step used for text classification applications to reduce the vector space size, control the computational time, and maintain or improve performance. In this study, we used an embedded approach in feature selection in which the Chi-square (CHI) feature selector is a filter step. In this step, the less discriminative features are discarded. In the wrapper step, a novel algorithm is proposed based on the combination of the fast global search ability of the genetic algorithm (GA) and the positive feedback mechanism of ant colony optimization (ACO). In order to validate our approach, we carned out a series of experiments on Reuters-21578 corpus, and we compare the achieved results with some other well-known techniques. The evaluation results are such that our method obtained a better performance compared with the other methods in the majority of cases.
机译:在基于向量空间模型的文本分类中,特征的高维可能会带来一些问题。这些问题的出现不仅是由于计算原因,而且还因为过度拟合。特征选择是用于文本分类应用程序的重要预处理步骤,以减少向量空间大小,控制计算时间以及保持或改善性能。在这项研究中,我们在特征选择中使用了嵌入式方法,其中卡方(CHI)特征选择器是过滤步骤。在此步骤中,将判别较少的特征。在包装步骤中,结合遗传算法(GA)的快速全局搜索能力和蚁群优化(ACO)的正反馈机制,提出了一种新颖的算法。为了验证我们的方法,我们对Reuters-21578语料库进行了一系列实验,并将获得的结果与其他一些知名技术进行了比较。评估结果表明,在大多数情况下,我们的方法比其他方法具有更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号