...
首页> 外文期刊>The international arab journal of information technology >Arabic Text Classification Using K-Nearest Neighbour Algorithm
【24h】

Arabic Text Classification Using K-Nearest Neighbour Algorithm

机译:使用最近邻算法的阿拉伯文本分类

获取原文
获取原文并翻译 | 示例
           

摘要

Many algorithms have been implemented to the problem of Automatic Text Categorization (ATC). Most of the work in this area has been carried out on English texts, with only a few researchers addressing Arabic texts. We have investigated the use of the K-Nearest Neighbour (K-NN) classifier, with an I-new, cosine, jaccard and dice similarities, in order to enhance Arabic ATC. We represent the dataset as un-stemmed and stemmed data; with the use of TREC-2002, in order to remove prefixes and suffixes. However, for statistical text representation, Bag-Of-Words (BOW) and character-level 3 (3-Gram) were used In order to, reduce the dimensionality of feature space; we used several feature selection methods. Experiments conducted with Arabic text showed that the K-NN classifier, with the new method similarity I-new 92.6% Macro-F1, had better performance than the K-NN classifier with cosine, jaccard and dice similarities. Chi-square feature selection, with representation by BOW, led to the best performance over other feature selection methods using BOW and 3-Gram.
机译:已经针对自动文本分类(ATC)问题实现了许多算法。该领域的大部分工作都是在英文文本上进行的,只有很少的研究人员致力于阿拉伯文本。我们已经研究了K最近邻(K-NN)分类器的使用,该分类器具有I-new,余弦,jaccard和dice相似性,以增强阿拉伯ATC。我们将数据集表示为未梗塞和梗塞的数据;使用TREC-2002,以删除前缀和后缀。但是,对于统计文本表示,使用了词袋(BOW)和字符级别3(3-Gram)来减少特征空间的维数;我们使用了几种特征选择方法。使用阿拉伯文本进行的实验表明,具有新方法相似性I-new 92.6%Macro-F1的K-NN分类器比具有余弦,雅卡德和骰子相似性的K-NN分类器具有更好的性能。用BOW表示的卡方特征选择导致使用BOW和3-Gram的其他特征选择方法的最佳性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号