首页> 外文期刊>Information Processing & Management >Character contiguity in N-gram-based word matching: the case for Arabic text searching
【24h】

Character contiguity in N-gram-based word matching: the case for Arabic text searching

机译:基于N元语法的单词匹配中的字符连续性:阿拉伯文本搜索的情况

获取原文
获取原文并翻译 | 示例
       

摘要

This work assesses the performance of two N-gram matching techniques for Arabic root-driven string searching: contiguous N-grams and hybrid N-grams, combining contiguous and non-contiguous. The two techniques were tested using three experiments involving different levels of textual word stemming, a textual corpus containing about 25 thousand words (with a total size of about 160KB), and a set of 100 query textual words. The results of the hybrid approach showed significant performance improvement over the conventional contiguous approach, especially in the cases where stemming was used. The present results and the inconsistent findings of previous studies raise some questions regarding the efficiency of pure conventional N-gram matching and the ways in which it should be used in languages other than English. (c) 2004 Elsevier Ltd. All rights reserved.
机译:这项工作评估两种N-gram匹配技术对阿拉伯语根驱动的字符串搜索的性能:连续N-gram和混合N-gram,将连续和非连续相结合。使用涉及不同级别的文本单词词干,包含大约25,000个单词(总大小约为160KB)的文本语料库和一组100个查询文本单词的三个实验对这两种技术进行了测试。混合方法的结果表明,与传统的连续方法相比,性能得到了显着提高,尤其是在使用词干的情况下。目前的结果和先前研究的不一致发现提出了一些有关纯常规N-gram匹配的效率以及在英语以外的语言中应使用该方法的问题。 (c)2004 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号