The impact of indexing approaches on Arabic text classification

Amer Al-Badarneh; Emad Al-Shawakfa; Basel Bani-lsmail; Khaleel Al-Rababah; Safwan Shatnawi

首页> 外文期刊>Journal of Information Science >The impact of indexing approaches on Arabic text classification

【24h】

The impact of indexing approaches on Arabic text classification

机译：索引方法对阿拉伯文本分类的影响

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper investigates the impact of using different indexing approaches (full-word, stem, and root) when classifying Arabic text. In this study, the naieve Bayes classifier is used to construct the multinomial classification models and is evaluated using stratified k-fold cross-validation (k ranges from 2 to 10). It is also uses a corpus that consists of 1000 normalized Arabic documents. The results of one experiment in this study show that significant accuracy improvements have occurred when the full-word form is used in most k-folds. Further experiments show that the classifier has achieved the highest accuracy in the eight-fold by using 7/8-1/8 train-test ratio, despite the indexing approach being used. The overall results of this study show that the classifier has achieved the maximum micro-average accuracy 99.36%, either by using the full-word form or the stem form. This proves that the stem is a better choice to use when classifying Arabic text, because it makes the corpus dataset smaller and this will enhance both the processing time and storage utilization, and achieve the highest level of accuracy.

机译：本文研究了在对阿拉伯文本进行分类时使用不同索引方法（全字，词干和词根）的影响。在这项研究中，朴素的贝叶斯分类器用于构建多项式分类模型，并使用分层k倍交叉验证（k范围从2到10）进行评估。它还使用由1000个标准化阿拉伯文档组成的语料库。这项研究中的一项实验结果表明，在大多数k折中使用全字形式时，准确性得到了显着提高。进一步的实验表明，尽管使用了分度方法，但通过使用7 / 8-1 / 8火车测试比率，该分类器已达到八分之一的最高准确性。这项研究的总体结果表明，通过使用全字词形式或词干形式，分类器已达到99.36％的最大微平均准确度。这证明了在对阿拉伯文本进行分类时，词干是更好的选择，因为它使语料库数据集更小，这将增加处理时间和存储利用率，并达到最高的准确性。

著录项

来源
《Journal of Information Science》 |2017年第2期|159-173|共15页
作者
Amer Al-Badarneh; Emad Al-Shawakfa; Basel Bani-lsmail; Khaleel Al-Rababah; Safwan Shatnawi;
展开▼
作者单位

Jordan University of Science & Technology, Jordan;

Yarmouk University, Jordan;

Sultan Qaboos University, Oman;

University of New Brunswick, Canada;

University of Bahrain, Bahrain;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Bayesian classifier; cross-validation; statistical classifier; text categorization; text classification; text retrieval;

机译：贝叶斯分类器交叉验证;统计分类器;文本分类文字分类文字检索;

相似文献

外文文献
中文文献
专利

1. Heuristic Lemmatization for Arabic Texts Indexation and Classification [J] . Faten Khalfallah Hammouda, Abdelsalam Abdelhamid Almarimi Journal of computer sciences . 2010,第6期

机译：阿拉伯文本索引和分类的启发式合法化
2. Heuristic Lemmatization for Arabic Texts Indexation and Classification | Science Publications [J] . Abdelsalam A. Almarimi, Faten K. Hammouda Journal of computer sciences . 2010,第6期

机译：阿拉伯文本索引和分类的启发式合法化科学出版物
3. A survey of Arabic text classification approaches [J] . Mostafa Sayed, Rashed K. Salem, Ayman E. Khder International Journal of Computer Applications in Technology . 2019,第3期

机译：阿拉伯文文本分类方法调查
4. Arabic text detection in videos using neural and boosting-based approaches: Application to video indexing [C] . Yousfi Sonia, Berrani Sid-Ahmed, Garcia Christophe IEEE International Conference on Image Processing . 2014

机译：使用基于神经和基于增强的方法检测视频中的阿拉伯文本：在视频索引中的应用
5. Improving Sentiment Classification for Arabic Short Text Using Deep Learning Approaches [D] . Alwehaibi, Ali. 2021

机译：利用深度学习方法改善阿拉伯语短文本的情感分类
6. Convolutional Neural Networks for Biomedical Text Classification: Application in Indexing Biomedical Articles [O] . Anthony Rios, Ramakanth Kavuluru -1

机译：用于生物医学文本分类的卷积神经网络：在生物医学文章索引中的应用
7. Arabic Language Processing for Text Classification. Contributions to Arabic Root Extraction Techniques, Building An Arabic Corpus, and to Arabic Text Classification Techniques. [O] . Al-Nashashibi May Yacoub Adib 2012

机译：用于文本分类的阿拉伯语言处理。对阿拉伯语根提取技术，建立阿拉伯语语料库和阿拉伯文本分类技术的贡献。

The impact of indexing approaches on Arabic text classification

摘要

著录项

相似文献

相关主题

期刊订阅