首页> 外文会议>Conference on Empirical Methods in Natural Language Processing >A Supervised Word Alignment Method based on Cross-Language Span Prediction using Multilingual BERT
【24h】

A Supervised Word Alignment Method based on Cross-Language Span Prediction using Multilingual BERT

机译:基于多语言伯特的交叉语言跨度预测的监督词对准方法

获取原文

摘要

We present a novel supervised word alignment method based on cross-language span prediction. We first formalize a word alignment problem as a collection of independent predictions from a token in the source sentence to a span in the target sentence. Since this step is equivalent to a SQuAD v2.0 style question answering task, we solve it using the multilingual BERT, which is fine-tuned on manually created gold word alignment data. It is nontrivial to obtain accurate alignment from a set of independently predicted spans. We greatly improved the word alignment accuracy by adding to the question the source token's context and symmetrizing two directional predictions. In experiments using five word alignment datasets from among Chinese, Japanese, German, Romanian, French, and English, we show that our proposed method significantly outperformed previous supervised and unsupervised word alignment methods without any bitexts for pretraining. For example, we achieved 86.7 Fl score for the Chinese-English data, which is 13.3 points higher than the previous state-of-the-art supervised method.
机译:我们提出了一种基于交叉语言跨度预测的新型监督词对准方法。我们首先将单词对齐问题形式形式化为从源句中的令牌到目标句子中的脚本中的独立预测的集合。由于此步骤相当于Squad V2.0样式问题应答任务,我们使用多语言BERT来解决,该伯特将在手动创建的金字对齐数据上进行微调。从一组独立预测的跨度获得精确的对齐是不动的。通过添加源令牌的上下文和对称两个方向预测来大大提高了单词对准精度。在实验中,使用中文,德语,罗马尼亚语,法语和英语中的五个单词对齐数据集,我们表明我们的提出方法显着优于先前的监督和无监督的词对齐方式,而无需任何BITEXT。例如,我们实现了86.7次的汉英数据得分,比以前的最先进的监督方法高13.3点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号