首页> 外文会议>Conference on Empirical Methods in Natural Language Processing >X-SRL: A Parallel Cross-Lingual Semantic Role Labeling Dataset
【24h】

X-SRL: A Parallel Cross-Lingual Semantic Role Labeling Dataset

机译:X-SRL:并行交叉语义角色标记数据集

获取原文

摘要

Even though SRL is researched for many languages, major improvements have mostly been obtained for English, for which more resources are available. In fact, existing multilingual SRL datasets contain disparate annotation styles or come from different domains, hampering generalization in multilingual learning. In this work we propose a method to automatically construct an SRL corpus that is parallel in four languages: English, French, German, Spanish, with unified predicate and role annotations that are fully comparable across languages. We apply high-quality machine translation to the English CoNLL-09 dataset and use multilingual BERT to project its high-quality annotations to the target languages. We include human-validated test sets that we use to measure the projection quality, and show that projection is denser and more precise than a strong baseline. Finally, we train different SOTA models on our novel corpus for mono-and multilingual SRL, showing that the multilingual annotations improve performance especially for the weaker languages.
机译:尽管SRL用于许多语言,但主要的改进主要是为英语获得的,其中更多资源可用。事实上,现有的多语言SRL数据集包含不同的注释样式或来自不同域,在多语言学习中妨碍泛化。在这项工作中,我们提出了一种自动构建四种语言并行的SRL语料库的方法:英语,法语,德语,西班牙语,具有统一的谓词和横跨语言完全可比的角色注释。我们将高质量的机器翻译应用于英语Conll-09数据集,并使用多语言伯格将其高质量注释项目投影到目标语言。我们包括我们用来测量投影质量的人为验证的测试集,并显示投影是更密集的,比强基线更精确。最后,我们在我们的新型语料库中培训不同的Sota模型,用于单语语言SRL,表明多语言注释提高了表现,特别是对于较弱的语言。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号