首页> 外文期刊>The international arab journal of information technology >Effective Unsupervised Arabic Word Stemming: Towards an Unsupervised Radicals Extraction
【24h】

Effective Unsupervised Arabic Word Stemming: Towards an Unsupervised Radicals Extraction

机译:有效的无监督阿拉伯语词干:实现无监督的自由基提取

获取原文
获取原文并翻译 | 示例
       

摘要

This paper presents a new totally unsupervised and 90% effective stemming approach for classical Arabic. This stemming is meant to be a preparatory step to an unsupervised root (i.e., radicals) extraction. As a learning input, our stemming system requires no linguistic knowledge but a plain classical Arabic text. Once the learning input analyzed, our stemming system is able to extract the strongest segment of a given length, namely the stem. We start by a definition of the targeted stem, then, we show how our system performs about 90% true positives after a leaning of less than 15000 words. Unlike the other unsupervised approaches, ours does not suppose the perfectness of the input text and deals efficiently with the eventual (practically very frequent) misspellings. The test corpus we have used is an ultimate reference in the classical Arabic and its labeling has been rigorously done by a team of experts.
机译:本文提出了一种新的完全不受监管且90%有效的古典阿拉伯词干提取方法。此词干是无监督提取根(即自由基)的准备步骤。作为学习的输入,我们的词干系统不需要任何语言知识,而只需一个普通的阿拉伯语文字。分析了学习输入后,我们的词干系统便能够提取给定长度的最强片段,即词干。我们先从目标词干的定义开始,然后说明在少于15000个单词之后,我们的系统如何执行约90%的真实肯定。与其他无监督方法不同,我们的方法不假设输入文本的完美,而是有效地处理了最终的(实际上非​​常频繁的)拼写错误。我们使用的测试语料库是经典阿拉伯语的终极参考,其标记由一组专家严格完成。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号