首页> 外文OA文献 >An Intelligent System For Arabic Text Categorization
【2h】

An Intelligent System For Arabic Text Categorization

机译:阿拉伯语文本分类智能系统

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Text Categorization (classification) is the process of classifying documents into a predefined set of categories based on their content. In this paper, an intelligent Arabic text categorization system is presented. Machine learning algorithms are used in this system. Many algorithms for stemming and feature selection are tried. Moreover, the document is represented using several term weighting schemes and finally the k-nearest neighbor and Rocchio classifiers are used for classification process. Experiments are performed over self collected data corpus and the results show that the suggested hybrid method of statistical and light stemmers is the most suitable stemming algorithm for Arabic language. The results also show that a hybrid approach of document frequency and information gain is the preferable feature selection criterion and normalized-tfidf is the best weighting scheme. Finally, Rocchio classifier has the advantage over k-nearest neighbor classifier in the classification process. The experimental results illustrate that the proposed model is an efficient method and gives generalization accuracy of about 98%.
机译:文本分类(分类)是根据文档的内容将文档分类为一组预定义的类别的过程。本文提出了一种智能的阿拉伯文本分类系统。在该系统中使用了机器学习算法。尝试了许多用于词干和特征选择的算法。此外,使用几种术语加权方案表示文档,最后将k最近邻和Rocchio分类器用于分类过程。在自我收集的数据语料上进行了实验,结果表明,建议的统计和轻型词干的混合方法是最适合阿拉伯语的词干算法。结果还表明,文档频率和信息增益的混合方法是首选的特征选择准则,而归一化的tfidf是最佳的加权方案。最后,Rocchio分类器在分类过程中比k最近邻分类器具有优势。实验结果表明,该模型是一种有效的方法,泛化精度约为98%。

著录项

  • 作者单位
  • 年度 2006
  • 总页数
  • 原文格式 PDF
  • 正文语种 {"code":"en","name":"English","id":9}
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号