首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Enhancing Hybrid Self-attention Structure with Relative-position-aware Bias for Speech Synthesis
【24h】

Enhancing Hybrid Self-attention Structure with Relative-position-aware Bias for Speech Synthesis

机译:具有相对位置感知偏差的混合自我注意结构,用于语音合成

获取原文

摘要

Compared with the conventional "front-end"-"back-end"- "vocoder" structure, based on the attention mechanism, end-to-end speech synthesis systems directly train and synthesize from text sequence to the acoustic feature sequence as a whole. Recently, a more calculation efficient end-to-end architecture named transformer, which is solely based on self-attention, was proposed to model global dependencies between the input and output sequences. However, although with many advantages, transformer lacks position information in its structure. Moreover, the weighted sum form in self-attention may disperse the attention to the whole input sequence other than focusing on the more important neighbouring positions. In order to solve the above problems, this paper introduces a hybrid self-attention structure which combines self-attention with the recurrent neural networks (RNNs). We further enhance the proposed structure with relative-position-aware biases. Mean opinion score (MOS) test results indicate that by enhancing hybrid self-attention structure with relative-position-aware biases, the proposed system achieves the best performance with only 0.11 MOS score lower than natural recording.
机译:与传统的“前端”-“后端”-“声码器”结构相比,基于注意力机制的端到端语音合成系统直接训练和合成从文本序列到整体的声学特征序列。最近,提出了一种仅基于自我注意的,计算效率更高的端到端体系结构,名为translator,以对输入和输出序列之间的全局依赖性进行建模。然而,尽管变压器具有许多优点,但其结构中却缺少位置信息。此外,自注意力的加权总和形式可能会将注意力分散到整个输入序列上,而不是将注意力集中在更重要的相邻位置上。为了解决上述问题,本文介绍了一种混合式自注意力结构,该结构将自注意力与循环神经网络(RNN)相结合。我们通过相对位置感知偏差进一步增强了所提出的结构。平均意见评分(MOS)测试结果表明,通过增强具有相对位置感知偏差的混合自我关注结构,该系统可实现最佳性能,而MOS评分仅比自然记录低0.11。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号