首页> 外文期刊>Journal of computer sciences >A COMPARATIVE STUDY OF COMBINED FEATURE SELECTION METHODS FOR ARABIC TEXT CLASSIFICATION | Science Publications
【24h】

A COMPARATIVE STUDY OF COMBINED FEATURE SELECTION METHODS FOR ARABIC TEXT CLASSIFICATION | Science Publications

机译:阿拉伯文本分类的组合特征选择方法比较研究科学出版物

获取原文
           

摘要

> Text classification is a very important task due to the huge amount of electronic documents. One of the problems of text classification is the high dimensionality of feature space. Researchers proposed many algorithms to select related features from text. These algorithms have been studied extensively for English text, while studies for Arabic are still limited. This study introduces an investigation on the performance of five widely used feature selection methods namely Chi-square, Correlation, GSS Coefficient, Information Gain and Relief F. In addition, this study also introduces an approach of combination of feature selection methods based on the average weight of the features. The experiments are conducted using Na?ˉve Bayes and Support Vector Machine classifiers to classify a published Arabic corpus. The results show that the best results were obtained when using Information Gain method. The results also show that the combination of multiple feature selection methods outperforms the best results obtain by the individual methods.
机译: >由于电子文档数量巨大,因此文本分类是一项非常重要的任务。文本分类的问题之一是特征空间的高维性。研究人员提出了许多算法来从文本中选择相关特征。这些算法已针对英语文本进行了广泛的研究,而对阿拉伯语的研究仍然很有限。本研究对卡方,相关,GSS系数,信息增益和救济F等五种广泛使用的特征选择方法的性能进行了研究。此外,本研究还介绍了一种基于平均值的特征选择方法组合的方法。功能的权重。实验是使用朴素贝叶斯和支持向量机分类器对已发布的阿拉伯语语料库进行分类的。结果表明,使用信息增益方法可获得最佳结果。结果还表明,多种特征选择方法的组合优于单独方法获得的最佳结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号