首页> 外国专利> SPECTROGRAM TO WAVEFORM SYNTHESIS USING CONVOLUTIONAL NETWORKS

SPECTROGRAM TO WAVEFORM SYNTHESIS USING CONVOLUTIONAL NETWORKS

机译：使用卷积网络进行波形合成的频谱

页面导航

摘要
著录项
相似文献

摘要

For the problem of waveform synthesis from spectrograms, presented herein are embodiments of an efficient neural network architecture, based on transposed convolutions to achieve a high compute intensity and fast inference. In one or more embodiments, for training of the convolutional vocoder architecture, losses are used that are related to perceptual audio quality, as well as a GAN framework to guide with a critic that discerns unrealistic waveforms. While yielding a high-quality audio, embodiments of the model can achieve more than 500 times faster than real-time audio synthesis. Multi-head convolutional neural network (MCNN) embodiments for waveform synthesis from spectrograms are also disclosed. MCNN embodiments enable significantly better utilization of modern multi-core processors than commonly-used iterative algorithms like Griffin-Lim and yield very fast (more than 300× real-time) waveform synthesis. Embodiments herein yield high-quality speech synthesis, without any iterative algorithms or autoregression in computations.

机译：对于从频谱图进行波形合成的问题，本文提出了一种有效的神经网络架构的实施例，其基于转置卷积以实现高计算强度和快速推断。在一个或多个实施例中，为了训练卷积声码器架构，使用与感知音频质量有关的损耗，以及GAN框架以指导辨别不真实波形的评论者。在产生高质量音频的同时，该模型的实施例可以比实时音频合成快500倍以上。还公开了用于从频谱图进行波形合成的多头卷积神经网络（MCNN）实施例。与常用的迭代算法（如Griffin-Lim）相比，MCNN实施例能够显着提高现代多核处理器的利用率，并能产生非常快速的（超过300倍的实时）波形合成。本文的实施例产生高质量的语音合成，而在计算中没有任何迭代算法或自回归。

著录项

公开/公告号US2019355347A1

专利类型
公开/公告日2019-11-21

原文格式PDF
申请/专利权人 BAIDU USA LLC;
展开▼

申请/专利号US201916365673
发明设计人 SERCAN ARIK;HEE WOO JUN;ERIC UNDERSANDER;GREGORY DIAMOS;
展开▼

申请日2019-03-27
分类号G10L15/06;G10L15/16;G10L25/18;
国家 US
入库时间 2022-08-21 11:20:30

相似文献

专利
外文文献
中文文献