...
首页> 外文期刊>Journal of Information Science >The impact of indexing approaches on Arabic text classification
【24h】

The impact of indexing approaches on Arabic text classification

机译:索引方法对阿拉伯文本分类的影响

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

This paper investigates the impact of using different indexing approaches (full-word, stem, and root) when classifying Arabic text. In this study, the naieve Bayes classifier is used to construct the multinomial classification models and is evaluated using stratified k-fold cross-validation (k ranges from 2 to 10). It is also uses a corpus that consists of 1000 normalized Arabic documents. The results of one experiment in this study show that significant accuracy improvements have occurred when the full-word form is used in most k-folds. Further experiments show that the classifier has achieved the highest accuracy in the eight-fold by using 7/8-1/8 train-test ratio, despite the indexing approach being used. The overall results of this study show that the classifier has achieved the maximum micro-average accuracy 99.36%, either by using the full-word form or the stem form. This proves that the stem is a better choice to use when classifying Arabic text, because it makes the corpus dataset smaller and this will enhance both the processing time and storage utilization, and achieve the highest level of accuracy.
机译:本文研究了在对阿拉伯文本进行分类时使用不同索引方法(全字,词干和词根)的影响。在这项研究中,朴素的贝叶斯分类器用于构建多项式分类模型,并使用分层k倍交叉验证(k范围从2到10)进行评估。它还使用由1000个标准化阿拉伯文档组成的语料库。这项研究中的一项实验结果表明,在大多数k折中使用全字形式时,准确性得到了显着提高。进一步的实验表明,尽管使用了分度方法,但通过使用7 / 8-1 / 8火车测试比率,该分类器已达到八分之一的最高准确性。这项研究的总体结果表明,通过使用全字词形式或词干形式,分类器已达到99.36%的最大微平均准确度。这证明了在对阿拉伯文本进行分类时,词干是更好的选择,因为它使语料库数据集更小,这将增加处理时间和存储利用率,并达到最高的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号