首页> 外文OA文献 >A Comparative Study of Root -Based and Stem -Based Approaches for Measuring the Similarity Between Arabic Words for Arabic Text Mining Applications
【2h】

A Comparative Study of Root -Based and Stem -Based Approaches for Measuring the Similarity Between Arabic Words for Arabic Text Mining Applications

机译:用于测量阿拉伯语文本挖掘应用中阿拉伯语单词的相似性的基于根基和茎的比较研究

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Representation of semantic information contained in the words is needed forany Arabic Text Mining applications. More precisely, the purpose is to bettertake into account the semantic dependencies between words expressed by theco-occurrence frequencies of these words. There have been many proposals tocompute similarities between words based on their distributions in contexts. Inthis paper, we compare and contrast the effect of two preprocessing techniquesapplied to Arabic corpus: Rootbased (Stemming), and Stem-based (Light Stemming)approaches for measuring the similarity between Arabic words with the wellknown abstractive model -Latent Semantic Analysis (LSA)- with a wide variety ofdistance functions and similarity measures, such as the Euclidean Distance,Cosine Similarity, Jaccard Coefficient, and the Pearson CorrelationCoefficient. The obtained results show that, on the one hand, the variety ofthe corpus produces more accurate results; on the other hand, the Stem-basedapproach outperformed the Root-based one because this latter affects the wordsmeanings.
机译:在Arabic Text挖掘应用程序中需要单词中包含的语义信息的表示。更确切地说,目的是更好地考虑由这些单词的基本发生频率表示的单词之间的语义依赖性。基于语境中的分布,有许多提案在单词之间进行了相似之处。纸张,我们将两种预处理技术应用与阿拉伯语中的两个预处理技术的效果进行比较:rootbased(茎),以及用于测量阿拉伯语单词与众所周知的抽象模型 - 阶级语义分析(LSA)之间的相似性的方法 - 具有各种各样的功能和相似度措施,如欧几里德距离,余弦相似性,Jaccard系数和Pearson相关性。获得的结果表明,一方面,各种语料库产生更准确的结果;另一方面,斯特基的人物表现出基于根本的东西,因为后者影响了这些话语。

著录项

  • 作者

    Hanane FROUD;

  • 作者单位
  • 年度 2012
  • 总页数
  • 原文格式 PDF
  • 正文语种
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号