首页> 外文会议>International Conference on Advances in Natural Language Processing(NLP, FinTAL2006); 20060823-25; Turku(FI) >Is a Morphologically Complex Language Really that Complex in Full-Text Retrieval?
【24h】

Is a Morphologically Complex Language Really that Complex in Full-Text Retrieval?

机译:在全文检索中,形态复杂的语言真的是那么复杂吗?

获取原文
获取原文并翻译 | 示例

摘要

In this paper we show that keyword variation of a morphologically complex language, Finnish, can be handled effectively for IR purposes by generating only the textually most frequent forms of the keyword. Theoretically Finnish nouns have about 2,000 different forms, but occurrences of most of the forms are rare. Corpus statistics showed that about 84 - 88 per cent of the occurrences of inflected noun forms are forms of only six cases out of the 14 possible. This number - maximally 2*6 - of keyword's variant forms makes it feasible to try them all in a search. IR results of the frequent keyword form variation coverage were tested with three to twelve keyword variant forms in two test collections, TUTK and CLEF 2003's Finnish material. The results show that the frequent keyword form generation method competes well with the gold standard, lemmatization, with nine and twelve variant keyword forms.
机译:在本文中,我们表明,仅生成文本上最常见的关键字,就可以有效地处理IR上形态复杂的语言(芬兰语)的关键字变体。从理论上讲,芬兰名词大约有2,000种不同的形式,但是大多数形式的出现很少。语料库统计数据显示,在14种可能的名词中,只有6种形式出现了不正常名词形式,其中约84%-88%出现了这种情况。此数字(最多2 * 6)的关键字变体形式可以在搜索中尝试所有形式。在两个测试集合TUTK和CLEF 2003的芬兰语材料中,使用了三到十二个关键字变体形式来测试频繁的关键字变体形式覆盖的IR结果。结果表明,常用关键词形式生成方法与金标准,词形化有九种和十二种变异关键词形式,具有很好的竞争性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号