A Supervised Word Alignment Method based on Cross-Language Span Prediction using Multilingual BERT

机译：基于多语言伯特的交叉语言跨度预测的监督词对准方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present a novel supervised word alignment method based on cross-language span prediction. We first formalize a word alignment problem as a collection of independent predictions from a token in the source sentence to a span in the target sentence. Since this step is equivalent to a SQuAD v2.0 style question answering task, we solve it using the multilingual BERT, which is fine-tuned on manually created gold word alignment data. It is nontrivial to obtain accurate alignment from a set of independently predicted spans. We greatly improved the word alignment accuracy by adding to the question the source token's context and symmetrizing two directional predictions. In experiments using five word alignment datasets from among Chinese, Japanese, German, Romanian, French, and English, we show that our proposed method significantly outperformed previous supervised and unsupervised word alignment methods without any bitexts for pretraining. For example, we achieved 86.7 Fl score for the Chinese-English data, which is 13.3 points higher than the previous state-of-the-art supervised method.

机译：我们提出了一种基于交叉语言跨度预测的新型监督词对准方法。我们首先将单词对齐问题形式形式化为从源句中的令牌到目标句子中的脚本中的独立预测的集合。由于此步骤相当于Squad V2.0样式问题应答任务，我们使用多语言BERT来解决，该伯特将在手动创建的金字对齐数据上进行微调。从一组独立预测的跨度获得精确的对齐是不动的。通过添加源令牌的上下文和对称两个方向预测来大大提高了单词对准精度。在实验中，使用中文，德语，罗马尼亚语，法语和英语中的五个单词对齐数据集，我们表明我们的提出方法显着优于先前的监督和无监督的词对齐方式，而无需任何BITEXT。例如，我们实现了86.7次的汉英数据得分，比以前的最先进的监督方法高13.3点。

著录项

来源
《Conference on Empirical Methods in Natural Language Processing》|2020年|555-565|共11页
会议地点
作者
Masaaki Nagata; Katsuki Chousa; Masaaki Nishino;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
入库时间 2022-08-26 13:54:35

相似文献

外文文献
中文文献
专利

1. Using Communities of Words Derived from Multilingual Word Vectors for Cross-Language Information Retrieval in Indian Languages [J] . Bhattacharya Paheli, Goyal Pawan, Sarkar Sudeshna ACM transactions on Asian language information processing . 2019,第1期

机译：使用多语言单词向量衍生的单词社区进行印度语言的跨语言信息检索
2. From Word Alignment to Word Senses, via Multilingual Wordnets [J] . Dan Tufis Computer science journal of Moldova . 2006,第1期

机译：通过多语言词网，从词对齐到词义
3. Multilingual emoji prediction using BERT for sentiment analysis [J] . Toshiki Tomihira, Atsushi Otsuka, Akihiro Yamashita, International journal of web information systems . 2020,第3期

机译：多语种Emoji使用BERT进行情感分析预测
4. DUSTer: A Method for Unraveling Cross-Language Divergences for Statistical Word-Level Alignment [C] . Bonnie J. Dorr, Lisa Pearl, Rebecca Hwa, 5th Conference of the Association for Machine Translation in the Americas, AMTA 2002, Oct 8-12, 2002, Tiburon, CA, USA . 2002

机译：DUSTer：一种用于统计语言级对齐的跨语言差异解散的方法
5. Multilingual model using cross-lingual word embeddings based on subword alignment and cross-task projection利用統計を見る [D] . Sakuma Jin 2019

机译：使用基于子词对齐和跨任务投影的跨语言词嵌入的多语言模型
6. DectICO: an alignment-free supervised metagenomic classification method based on feature extraction and dynamic selection [O] . Xiao Ding, Fudong Cheng, Changchang Cao, 2015

机译：DectICO：基于特征提取和动态选择的无对准监督宏基因组分类方法
7. Which granularity to bootstrap a multilingual method of document alignment: character N-grams or word N-grams? [O] . Lecluze, Charlotte, Rigouste, Loïs, Giguet, Emmanuel, 2013

机译：引导多语言文档对齐方式的哪种粒度：字符N-gram或单词N-gram？

A Supervised Word Alignment Method based on Cross-Language Span Prediction using Multilingual BERT

摘要

著录项

相似文献

相关主题

期刊订阅