Comparative evaluation of text classification techniques using a large diverse Arabic dataset

Mohammad S. Khorsheed; Abdulmohsen O. Al-Thubaity

首页> 外文期刊>Language Resources and Evaluation >Comparative evaluation of text classification techniques using a large diverse Arabic dataset

【24h】

Comparative evaluation of text classification techniques using a large diverse Arabic dataset

机译：使用大量不同的阿拉伯数据集进行文本分类技术的比较评估

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

A vast amount of valuable human knowledge is recorded in documents. The rapid growth in the number of machine-readable documents for public or private access necessitates the use of automatic text classification. While a lot of effort has been put into Western languages—mostly English—minimal experimentation has been done with Arabic. This paper presents, first, an up-to-date review of the work done in the field of Arabic text classification and, second, a large and diverse dataset that can be used for benchmarking Arabic text classification algorithms. The different techniques derived from the literature review are illustrated by their application to the proposed dataset. The results of various feature selections, weighting methods, and classification algorithms show, on average, the superiority of support vector machine, followed by the decision tree algorithm (C4.5) and Naïve Bayes. The best classification accuracy was 97 % for the Islamic Topics dataset, and the least accurate was 61 % for the Arabic Poems dataset.

机译：文档中记录了大量有价值的人类知识。用于公共或私人访问的机器可读文档的数量迅速增长，因此必须使用自动文本分类。尽管西方语言（主要是英语）已经投入了很多精力，但对阿拉伯语的尝试却很少。本文首先介绍了在阿拉伯文本分类领域中所做工作的最新回顾，其次是可用于基准化阿拉伯文本分类算法的庞大而多样的数据集。从文献综述中得出的不同技术通过将其应用于建议的数据集进行了说明。平均而言，各种特征选择，加权方法和分类算法的结果显示了支持向量机的优越性，其次是决策树算法（C4.5）和朴素贝叶斯。伊斯兰主题数据集的分类准确度最高为97％，而阿拉伯诗词数据集的最低准确度为61％。

著录项

来源
《Language Resources and Evaluation 》 |2013年第2期| 513-538| 共26页
作者
Mohammad S. Khorsheed; Abdulmohsen O. Al-Thubaity;
展开▼
作者单位

King Abdulaziz City for Science Technology">(1);

King Abdulaziz City for Science Technology">(1);

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Machine learning; Arabic text categorization; Arabic text classification;

机译：机器学习;阿拉伯文字分类;阿拉伯文字分类;

相似文献

外文文献
中文文献
专利

1. Comparative evaluation of text classification techniques using a large diverse Arabic dataset [J] . Mohammad S. Khorsheed, Abdulmohsen O. Al-Thubaity Language Resources and Evaluation . 2013 ,第2期

机译：使用大量不同的阿拉伯数据集进行文本分类技术的比较评估
2. A Comparison of Text-Classification Techniques Applied to Arabic Text [J] . Ghassan Kanaan, Riyad Al-Shalabi, Sameh Ghwanmeh, Journal of the American Society for Information Science and Technology . 2009 ,第9期

机译：应用于阿拉伯文本的文本分类技术的比较
3. AUTOMATIC MACHINE LEARNING TECHNIQUES (AMLT) FOR ARABIC TEXT CLASSIFICATION BASED ON TERM COLLOCATIONS [J] . FEKRY OLAYAH, WASEEM ALROMIMA Journal of Theoretical and Applied Information Technology . 2018 ,第12期

机译：基于术语搭配的阿拉伯文文本分类自动机器学习技术（AMLT）
4. Arabic Text Classification: A Comparative Approach Using a Big Dataset [C] . Mokhtar Ali Hasan Madhfar, Mohammed Abdullah Hassan Al-Hagery International Conference on Computer and Information Sciences . 2019

机译：阿拉伯文本分类：使用大数据集的比较方法
5. Online Arabic Text Recognition Using Statistical Techniques [D] . Al-Helali, Baligh Mohammed. 2016

机译：使用统计技术在线阿拉伯语文本识别
6. Comparative evaluation of set-level techniques in predictive classification of gene expression samples [O] . Matěj Holec, Jiří Kléma, Filip Železný, 2012

机译：基因表达样品预测分类中集水平技术的比较评估
7. Arabic Language Processing for Text Classification. Contributions to Arabic Root Extraction Techniques, Building An Arabic Corpus, and to Arabic Text Classification Techniques. [O] . Al-Nashashibi May Yacoub Adib 2012

机译：用于文本分类的阿拉伯语言处理。对阿拉伯语根提取技术，建立阿拉伯语语料库和阿拉伯文本分类技术的贡献。

Comparative evaluation of text classification techniques using a large diverse Arabic dataset

摘要

著录项

相似文献

相关主题

期刊订阅