首页> 外文会议>International conference on Asian digital libraries >The Effectiveness of a Graph-Based Algorithm for Stemming
【24h】

The Effectiveness of a Graph-Based Algorithm for Stemming

机译:基于图形的茎秆算法的有效性

获取原文

摘要

In Information Retrieval (IR), stemming enables a matching of query and document terms which are related to a same meaning but which can appear in different morphological variants. In this paper we will propose and evaluate a statistical graph-based algorithm for stemming. Considering that a word is formed by a stem (prefix) and a derivation (suffix), the key idea is that strongly interlinked prefixes and suffixes form a community of sub-strings. Discovering these communities means searching for the best word splits which give the best word stems. We conducted some experiments on CLEF 2001 test sub-collections for Italian language. The results show that stemming improve the IR effectiveness. They also show that effectiveness level of our algorithm is comparable to that of an algorithm based on a-priori linguistic knowledge. This is an encouraging result, particularly in a multi-lingual context.
机译:在信息检索(IR)中,Stemming使得查询和文档术语的匹配与相同的含义相关但是可以出现在不同的形态变异中。在本文中,我们将提出并评估基于统计图的终测算法。考虑到词根(前缀)和衍生(后缀)形成一个单词,关键的想法是强烈互连的前缀和后缀形成了子字符串的社区。发现这些社区意味着寻找最好的词拆分,给出最好的单词茎。我们对意大利语的Clef 2001测试子集合进行了一些实验。结果表明,源病提高了红外效果。他们还表明,我们的算法的有效性水平与基于a-priori语言知识的算法的效力水平相当。这是一个令人鼓舞的结果,特别是在多语言背景下。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号