首页>
外国专利>
JOINT AUTOMATIC SPEECH RECOGNITION AND TEXT TO SPEECH CONVERSION USING ADVERSARIAL NEURAL NETWORKS
JOINT AUTOMATIC SPEECH RECOGNITION AND TEXT TO SPEECH CONVERSION USING ADVERSARIAL NEURAL NETWORKS
展开▼
机译:使用对抗神经网络联合自动语音识别和语音转换文本
展开▼
页面导航
摘要
著录项
相似文献
摘要
An end-to-end deep-learning-based system that can solve both ASR and TTS problems jointly using unpaired text and audio samples is disclosed herein. An adversarially-trained approach is used to generate a more robust independent TTS neural network and an ASR neural network that can be deployed individually or simultaneously. The process for training the neural networks includes generating an audio sample from a text sample using the TTS neural network, then feeding the generated audio sample into the ASR neural network to regenerate the text. The difference between the regenerated text and the original text is used as a first loss for training the neural networks. A similar process is used for an audio sample. The difference between the regenerated audio and the original audio is used as a second loss. Text and audio discriminators are similarly used on the output of the neural network to generate additional losses for training.
展开▼