Enhancing Hybrid Self-attention Structure with Relative-position-aware Bias for Speech Synthesis

机译：具有相对位置感知偏差的混合自我注意结构，用于语音合成

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Compared with the conventional "front-end"-"back-end"- "vocoder" structure, based on the attention mechanism, end-to-end speech synthesis systems directly train and synthesize from text sequence to the acoustic feature sequence as a whole. Recently, a more calculation efficient end-to-end architecture named transformer, which is solely based on self-attention, was proposed to model global dependencies between the input and output sequences. However, although with many advantages, transformer lacks position information in its structure. Moreover, the weighted sum form in self-attention may disperse the attention to the whole input sequence other than focusing on the more important neighbouring positions. In order to solve the above problems, this paper introduces a hybrid self-attention structure which combines self-attention with the recurrent neural networks (RNNs). We further enhance the proposed structure with relative-position-aware biases. Mean opinion score (MOS) test results indicate that by enhancing hybrid self-attention structure with relative-position-aware biases, the proposed system achieves the best performance with only 0.11 MOS score lower than natural recording.

机译：与传统的“前端”-“后端”-“声码器”结构相比，基于注意力机制的端到端语音合成系统直接训练和合成从文本序列到整体的声学特征序列。最近，提出了一种仅基于自我注意的，计算效率更高的端到端体系结构，名为translator，以对输入和输出序列之间的全局依赖性进行建模。然而，尽管变压器具有许多优点，但其结构中却缺少位置信息。此外，自注意力的加权总和形式可能会将注意力分散到整个输入序列上，而不是将注意力集中在更重要的相邻位置上。为了解决上述问题，本文介绍了一种混合式自注意力结构，该结构将自注意力与循环神经网络（RNN）相结合。我们通过相对位置感知偏差进一步增强了所提出的结构。平均意见评分（MOS）测试结果表明，通过增强具有相对位置感知偏差的混合自我关注结构，该系统可实现最佳性能，而MOS评分仅比自然记录低0.11。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2019年|6910-6914|共5页
会议地点
作者
Shan Yang; Heng Lu; Shiying Kang; Lei Xie; Dong Yu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Speech synthesis; Computer architecture; Decoding; Encoding; Acoustics; Hidden Markov models; Vocoders;

机译：语音合成;计算机体系结构;解码;编码;声学;隐马尔可夫模型;声码器;

相似文献

外文文献
中文文献
专利

1. Synthesis of amorphous cobalt silicate nanobelts@manganese silicate core-shell structures as enhanced electrode for high-performance hybrid supercapacitors [J] . Journal of Colloid and Interface Science . 2020,第期

机译：用于高性能杂交超级电容器的增强电极的无定形钴硅酸盐纳米纤维植物的合成
2. Design and Synthesis of Layered Na 2 Ti 3 O 7 and Tunnel Na 2 Ti 6 O 13 Hybrid Structures with Enhanced Electrochemical Behavior for Sodium‐Ion Batteries [J] . Chunjin Wu, Weibo Hua, Zheng Zhang, Advanced Science . 2018,第9期

机译：钠离子电池电化学行为增强的层状Na 2 Ti 3 O 7与隧道Na 2 Ti 6 O 13杂化结构的设计与合成
3. The Synthesis of the Core/Shell Structured Diamond/Akageneite Hybrid Particles with Enhanced Polishing Performance [J] . Jing Lu, Yongchao Xu, Dayu Zhang, Materials . 2017,第6期

机译：具有增强抛光性能的芯/壳结构金刚石/ Akageneite杂种颗粒的合成
4. Enhancing Hybrid Self-attention Structure with Relative-position-aware Bias for Speech Synthesis [C] . Shan Yang, Heng Lu, Shiying Kang, IEEE International Conference on Acoustics, Speech and Signal Processing . 2019

机译：用语音合成的相对位置感知偏差增强混合自我关注结构
5. Speech Enhancement Using Speech Synthesis Techniques [D] . Maiti, Soumi. 2021

机译：使用语音合成技术进行语音增强
6. Design and Synthesis of Layered Na2Ti3O7 and Tunnel Na2Ti6O13 Hybrid Structures with Enhanced Electrochemical Behavior for Sodium‐Ion Batteries [O] . Chunjin Wu, Weibo Hua, Zheng Zhang, 2018

机译：钠离子电池电化学行为增强的层状Na2Ti3O7和隧道式Na2Ti6O13杂化结构的设计与合成
7. Investigation of Enhanced Tacotron Text-to-speech Synthesis Systems with Self-attention for Pitch Accent Language [O] . Yusuke Yasuda, Xin Wang, Shinji Takaki, 2019

机译：增强塔克里斯龙语言与语音合成系统，自我关注音调语言

Enhancing Hybrid Self-attention Structure with Relative-position-aware Bias for Speech Synthesis

摘要

著录项

相似文献

相关主题

期刊订阅