Character contiguity in N-gram-based word matching: the case for Arabic text searching

Mustafa SH

首页> 外文期刊>Information Processing & Management >Character contiguity in N-gram-based word matching: the case for Arabic text searching

【24h】

Character contiguity in N-gram-based word matching: the case for Arabic text searching

机译：基于N元语法的单词匹配中的字符连续性：阿拉伯文本搜索的情况

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This work assesses the performance of two N-gram matching techniques for Arabic root-driven string searching: contiguous N-grams and hybrid N-grams, combining contiguous and non-contiguous. The two techniques were tested using three experiments involving different levels of textual word stemming, a textual corpus containing about 25 thousand words (with a total size of about 160KB), and a set of 100 query textual words. The results of the hybrid approach showed significant performance improvement over the conventional contiguous approach, especially in the cases where stemming was used. The present results and the inconsistent findings of previous studies raise some questions regarding the efficiency of pure conventional N-gram matching and the ways in which it should be used in languages other than English. (c) 2004 Elsevier Ltd. All rights reserved.

机译：这项工作评估两种N-gram匹配技术对阿拉伯语根驱动的字符串搜索的性能：连续N-gram和混合N-gram，将连续和非连续相结合。使用涉及不同级别的文本单词词干，包含大约25,000个单词（总大小约为160KB）的文本语料库和一组100个查询文本单词的三个实验对这两种技术进行了测试。混合方法的结果表明，与传统的连续方法相比，性能得到了显着提高，尤其是在使用词干的情况下。目前的结果和先前研究的不一致发现提出了一些有关纯常规N-gram匹配的效率以及在英语以外的语言中应使用该方法的问题。（c）2004 Elsevier Ltd.保留所有权利。

著录项

来源
《Information Processing & Management》 |2005年第4期|p. 819-827|共9页
作者
Mustafa SH;
展开▼
作者单位

Yarmouk Univ, Fac Informat Technol, Dept Comp Informat Syst, Irbid, Jordan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类图书馆学、图书馆事业;
关键词
N-grams; string matching; text searching; stemming; word conflation;

机译：N-gram;字符串匹配;文本搜索;词干;单词合并;
入库时间 2022-08-17 23:20:34

相似文献

外文文献
中文文献
专利

1. Word-Oriented Approximate String Matching Using Occurrence Heuristic Tables: A Heuristic for Searching Arabic Text [J] . Suleiman H. Mustafa Journal of the American Society for Information Science and Technology . 2005,第14期

机译：使用出现启发式表的单词定向近似字符串匹配：搜索阿拉伯文本的启发式
2. A morphology-driven string matching approach to Arabic text searching [J] . Suleiman H. Mustafa Operations Research . 2004,第1期

机译：形态学驱动的字符串匹配方法用于阿拉伯文本搜索
3. A Comparative Study of Root -Based and Stem -Based Approaches for Measuring the Similarity Between Arabic Words for Arabic Text Mining Applications [J] . Hanane FROUD, Abdelmonaim LACHKAR, Said ALAOUI OUATIK Advanced Computing: an International Journal . 2012,第6期

机译：阿拉伯文本挖掘应用中基于词根和词干的阿拉伯词之间相似性度量方法的比较研究
4. Enhanced Pattern Matching Algorithms for Searching Arabic Text Based on Multithreading Technology [C] . Sanna Abu Sini, Bassam H. Hammo, Nadim Obeid International Symposium on Signal Processing and Information Technology . 2019

机译：基于多线程技术的阿拉伯文本搜索增强模式匹配算法
5. Probabilistic methods for searching OCR-degraded Arabic text. [D] . Darwish, Kareem M. 2003

机译：用于搜索OCR降级的阿拉伯文本的概率方法。
6. Use of Text Searching for Trigger Words in Medical Records to Identify Adverse Drug Reactions within an Intensive Care Unit Discharge Summary [O] . Sandra L. Kane-Gill, Adam M. MacLasco, Melissa I. Saul, 2016

机译：使用文本搜索病历中的触发词来识别重症监护病房出院摘要中的不良药物反应
7. Word-level recognition of multifont Arabic text using a feature-vector matching approach [O] . Erik J. Erlandson, John M. Trenkle, Robert C. Vogt 1996

机译：使用特征向量匹配方法的多字体阿拉伯文字的单词级识别

Character contiguity in N-gram-based word matching: the case for Arabic text searching

摘要

著录项

相似文献

相关主题

期刊订阅