首页> 外文期刊>International Journal of Artificial Intelligence & Applications (IJAIA) >An Effective Arabic Text Classification Approach Based on Kernel Naive Bayes Classifier
【24h】

An Effective Arabic Text Classification Approach Based on Kernel Naive Bayes Classifier

机译:一种基于朴素贝叶斯分类器的有效阿拉伯文本分类方法

获取原文
       

摘要

With growing texts of electronic documents used in many applications, a fast and accurate textclassification method is very important. Arabic text classification is one of the most challenging topics. Thisis probably caused by the fact that Arabic words have unlimited variation in the meaning, in addition to theproblems that are specific to Arabic language only. Many studies have been proved that Naive Bayes (NB)classifier is being relatively robust, easy to implement, fast, and accurate for many different fields such astext classification. However, non-linear classification and strong violations of the independenceassumptions problems can lead to very poor performance of NB classifier. In this paper, first, we preprocessthe Arabic documents to tokenize only the Arabic words. Second, we convert those words intovectors using term frequency and inverse document frequency (TF-IDF) technique. Third, we propose anefficient approach based on Kernel Naive Bayes (KNB) classifier to solve the non-linearity problem ofArabic text classification. Finally, experimental results and performance evaluation on our collecteddataset of Arabic topic mining corpus are presented, showing the effectiveness of the proposed KNBclassifier against other baseline classifiers.
机译:随着在许多应用中使用的电子文档的文本不断增长,快速准确的文本分类方法非常重要。阿拉伯文本分类是最具挑战性的主题之一。这可能是由于阿拉伯语单词除了仅针对阿拉伯语的问题以外,其含义无限制的变化而造成的。许多研究已经证明,朴素贝叶斯(NB)分类器对于诸如文本分类之类的许多不同领域而言是相对健壮,易于实现,快速且准确的。但是,非线性分类和严重违反独立性假设问题会导致NB分类器的性能非常差。在本文中,首先,我们对阿拉伯语文档进行预处理以仅对阿拉伯语单词进行标记。其次,我们使用词频和文档反向频率(TF-IDF)技术将这些单词转换为向量。第三,提出了一种基于朴素贝叶斯分类器的高效方法来解决阿拉伯文本分类的非线性问题。最后,对收集的阿拉伯语主题挖掘语料库的数据集进行了实验结果和性能评估,表明所提出的KNB分类器相对于其他基准分类器的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号