首页> 外文会议>Text, speech and dialogue >Comparison of Different Lemmatization Approaches through the Means of Information Retrieval Performance
【24h】

Comparison of Different Lemmatization Approaches through the Means of Information Retrieval Performance

机译:通过信息检索性能比较不同的拔除方法

获取原文
获取原文并翻译 | 示例

摘要

This paper presents a quantitative performance analysis of two different approaches to the lemmatization of the Czech text data. The first one is based on manually prepared dictionary of lemmas and set of derivation rules while the second one is based on automatic inference of the dictionary and the rules from training data. The comparison is done by evaluating the mean Generalized Average Precision (mGAP) measure of the lemmatized documents and search queries in the set of information retrieval (IR) experiments. Such method is suitable for efficient and rather reliable comparison of the lemmatization performance since a correct lemmatization has proven to be crucial for IR effectiveness in highly inflected languages. Moreover, the proposed indirect comparison of the lemma-tizers circumvents the need for manually lemmatized test data which are hard to obtain and also face the problem of incompatible sets of lemmas across different systems.
机译:本文介绍了两种不同方法对捷克文本数据进行词素化的定量性能分析。第一个基于人工准备的引理字典和一组推导规则,而第二个基于字典和规则的自动推断,它们来自训练数据。比较是通过评估信息检索(IR)实验集中的去粗糙化文档和搜索查询的平均广义平均精度(mGAP)度量来完成的。这种方法适用于高效和相当可靠的词素化性能比较,因为事实证明正确的词素化对于高度折衷的语言中的IR有效性至关重要。此外,所提出的间接引理器的间接比较避免了对难以获得手动引理化测试数据的需求,并且还面临着跨不同系统的引理集不兼容的问题。

著录项

  • 来源
    《Text, speech and dialogue》|2010年|p.93-100|共8页
  • 会议地点 Brno(CZ);Brno(CZ)
  • 作者

    Jakub Kanis; Lucie Skorkovska;

  • 作者单位

    Univ. ofWest Bohemia, Faculty of Applied Sciences, Dept. of Cybernetics Univerzitni 8, 306 14 Pilsen, Czech Republic;

    Univ. ofWest Bohemia, Faculty of Applied Sciences, Dept. of Cybernetics Univerzitni 8, 306 14 Pilsen, Czech Republic;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 人工智能理论;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号