首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >MAPGN: Masked Pointer-Generator Network for Sequence-to-Sequence Pre-Training
【24h】

MAPGN: Masked Pointer-Generator Network for Sequence-to-Sequence Pre-Training

机译:MAPGN:屏蔽指针发生器网络,用于序列到序列预训练

获取原文

摘要

This paper presents a self-supervised learning method for pointer-generator networks to improve spoken-text normalization. Spoken-text normalization that converts spoken-style text into style normalized text is becoming an important technology for improving subsequent processing such as machine translation and summarization. The most successful spoken-text normalization method to date is sequence-to-sequence (seq2seq) mapping using pointer-generator networks that possess a copy mechanism from an input sequence. However, these models require a large amount of paired data of spoken-style text and style normalized text, and it is difficult to prepare such a volume of data. In order to construct spoken-text normalization model from the limited paired data, we focus on self-supervised learning which can utilize unpaired text data to improve seq2seq models. Unfortunately, conventional self-supervised learning methods do not assume that pointer-generator networks are utilized. Therefore, we propose a novel self-supervised learning method, MAsked Pointer-Generator Network (MAPGN). The proposed method can effectively pre-train the pointer-generator net-work by learning to fill masked tokens using the copy mechanism. Our experiments demonstrate that MAPGN is more effective for pointer-generator networks than the conventional self-supervised learning methods in two spoken-text normalization tasks.
机译:本文介绍了指针发电机网络的自我监督学习方法,以改善口语文本标准化。通言文本标准化将口语样式文本转换为样式规范化文本正在成为改善后续处理的重要技术,例如机器翻译和摘要。迄今为止最成功的口语归一化方法是使用具有来自输入序列的复制机制的指针生成器网络来序列到序列(SEQ2Seq)映射。但是,这些模型需要大量的口语式文本和样式规范化文本的配对数据,并且很难准备这样的数据量。为了从有限的配对数据构建语音归一化模型,我们专注于自我监督的学习,可以利用未配对的文本数据来改进SEQ2Seq模型。不幸的是,传统的自我监督学习方法不认为使用指针发生器网络。因此,我们提出了一种新颖的自我监督学习方法,屏蔽指针发生器网络(MAPGN)。所提出的方法可以通过学习使用副本机制来填充蒙版令牌来有效地预先列车。我们的实验表明,MAPGN对指针发生器网络更有效地比传统的自我监督学习方法在两个口语文本规范化任务中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号