【24h】

Experiments in Cross-Language Morphological Annotation Transfer

机译:跨语言形态标注转移实验

获取原文
获取原文并翻译 | 示例

摘要

Annotated corpora are valuable resources for NLP which are often costly to create. We introduce a method for transferring annotation from a morphologically annotated corpus of a source language to a target language. Our approach assumes only that an unannotated text corpus exists for the target language and a simple textbook which describes the basic morphological properties of that language is available. Our paper describes experiments with Polish, Czech, and Russian. However, the method is not tied in any way to these languages. In all the experiments we use the TnT tagger, a second-order Markov model. Our approach assumes that the information acquired about one language can be used for processing a related language. We have found out that even breath-takingly naive things (such as approximating the Russian transitions by Czech and/or Polish and approximating the Russian emissions by (manually/automatically derived) Czech cognates) can lead to a significant improvement of the tagger's performance.
机译:带注释的语料库是NLP的宝贵资源,通常创建成本很高。我们介绍了一种用于将注释从源语言的形态标注语料库转移到目标语言的方法。我们的方法仅假设目标语言存在未注释的文本语料库,并且提供了描述该语言的基本形态学特性的简单教科书。本文介绍了波兰语,捷克语和俄语的实验。但是,该方法不以任何方式绑定到这些语言。在所有实验中,我们都使用二阶马尔可夫模型TnT tagger。我们的方法假设获取的有关一种语言的信息可用于处理相关语言。我们发现,即使是令人屏息的天真事物(例如,通过捷克和/或波兰近似于俄罗斯的过渡以及通过(手动/自动获得)捷克同源来近似俄罗斯的排放量)也可以显着提高标记器的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号