首页> 外文会议>International Conference on Language Resources and Evaluation >Shallow Discourse Parsing for Under-Resourced Languages: Combining Machine Translation and Annotation Projection
【24h】

Shallow Discourse Parsing for Under-Resourced Languages: Combining Machine Translation and Annotation Projection

机译:资源低调语言的浅话题解析:组合机器翻译和注释投影

获取原文

摘要

Shallow Discourse Parsing (SDP), the identification of coherence relations between text spans, relies on large amounts of training data, which so far exists only for English - any other language is in this respect an under-resourced one. For those languages where machine translation from English is available with reasonable quality, MT in conjunction with annotation projection can be an option for producing an SDP resource. In our study, we translate the English Penn Discourse TreeBank into German and experiment with various methods of annotation projection to arrive at the German counterpart of the PDTB. We describe the key characteristics of the corpus as well as some typical sources of errors encountered during its creation. Then we evaluate the GermanPDTB by training components for selected sub-tasks of discourse parsing on this silver data and compare performance to the same components when trained on the gold, original PDTB corpus.
机译:浅话语解析(SDP),文本跨度之间的一致性关系的识别,依赖于大量培训数据,这是迄今为止仅适用于英语 - 任何其他语言都在这方面是资源不足的培训数据。对于那些从英语翻译的语言具有合理的质量,MT与注释投影结合可以是生产SDP资源的选项。在我们的研究中,我们将英国宾夕法尼亚州语篇TreeBank翻译成德语,并用各种注释投影方法进行实验,到达PDTB的德国对应物。我们描述了语料库的关键特征以及在创建期间遇到的一些典型的错误来源。然后,我们通过培训组件来评估DEAMBEDPDTB,用于在此银数据上解析的选定子任务,并在GOLD上培训时将性能进行比较,原始PDTB语料库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号