...
首页> 外文期刊>Information retrieval >How Effective is Stemming and Decompounding for German Text Retrieval?
【24h】

How Effective is Stemming and Decompounding for German Text Retrieval?

机译:词干和分解对德语文本检索的效果如何?

获取原文
获取原文并翻译 | 示例

摘要

Information retrieval systems operating on free text face difficulties when word forms used in the query and documents do not match. The usual solution is the use of a "stemming component" that reduces related word forms to a common stem. Extensive studies of such components exist for English, but considerably less is known for other languages. Previously, it has been claimed that stemming is essential for highly declensional languages. We report on our experiments on stemming for German, where an additional issue is the handling of compounds, which are formed by concatenating several words. The major contribution of our work that goes beyond its focus on German lies in the investigation of a complete spectrum of approaches, ranging from language-independent to elaborate linguistic methods. The main findings are that stemming is beneficial even when using a simple approach, and that carefully designed decompounding, the splitting of compound words, remarkably boosts performance. All findings are based on a thorough analysis using a large reliable test collection.
机译:当查询中使用的单词格式与文档不匹配时,在自由文本上运行的信息检索系统会遇到困难。通常的解决方案是使用“词干成分”,以将相关单词的形式简化为通用词干。对于英语,已经对此类组件进行了广泛的研究,但对其他语言的了解却很少。以前,有人声称词干对于高度变形的语言至关重要。我们报告了德语词干的实验,其中另一个问题是化合物的处理,这些化合物是由多个单词串联而成的。我们的工作超出了对德语的关注,其主要贡献在于对各种方法的研究,从独立于语言到精心设计的语言方法。主要发现是,即使使用简单的方法,词干也是有好处的;精心设计的复合词分解(复合词的拆分)显着提高了性能。所有发现均基于使用大量可靠测试集进行的全面分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号