首页> 外国专利> JOINT AUTOMATIC SPEECH RECOGNITION AND TEXT TO SPEECH CONVERSION USING ADVERSARIAL NEURAL NETWORKS

JOINT AUTOMATIC SPEECH RECOGNITION AND TEXT TO SPEECH CONVERSION USING ADVERSARIAL NEURAL NETWORKS

机译：使用对抗神经网络联合自动语音识别和语音转换文本

页面导航

摘要
著录项
相似文献

摘要

An end-to-end deep-learning-based system that can solve both ASR and TTS problems jointly using unpaired text and audio samples is disclosed herein. An adversarially-trained approach is used to generate a more robust independent TTS neural network and an ASR neural network that can be deployed individually or simultaneously. The process for training the neural networks includes generating an audio sample from a text sample using the TTS neural network, then feeding the generated audio sample into the ASR neural network to regenerate the text. The difference between the regenerated text and the original text is used as a first loss for training the neural networks. A similar process is used for an audio sample. The difference between the regenerated audio and the original audio is used as a second loss. Text and audio discriminators are similarly used on the output of the neural network to generate additional losses for training.

机译：这里公开了一种能够解决ASR和TTS问题的基于端到端的基于深度学习的系统，本文公开了使用未配对文本和音频样本。普遍培训的方法用于生成更强大的独立TTS神经网络和可以单独或同时部署的ASR神经网络。训练神经网络的过程包括使用TTS神经网络从文本样本生成音频样本，然后将生成的音频样本馈送到ASR神经网络中以重新生成文本。再生文本与原始文本之间的差异被用作培训神经网络的第一损失。类似的过程用于音频样本。再生音频和原始音频之间的差异用作第二损耗。文本和音频鉴别器类似地用于神经网络的输出，以产生额外的训练损失。

著录项

公开/公告号US2022005457A1

专利类型
公开/公告日2022-01-06

原文格式PDF
申请/专利权人 FORD GLOBAL TECHNOLOGIES LLC;
展开▼

申请/专利号US202016919315
发明设计人 KAUSHIK BALAKRISHNAN;PRAVEEN NARAYANAN;FRANCOIS CHARETTE;
展开▼

申请日2020-07-02
分类号G10L13/047;G10L15/16;
国家 US
入库时间 2022-08-24 23:14:45

相似文献

专利
外文文献
中文文献