首页> 外国专利> DEEP LEARNING MODELS FOR SPEECH RECOGNITION

DEEP LEARNING MODELS FOR SPEECH RECOGNITION

机译：语音识别的深度学习模型

页面导航

摘要
著录项
相似文献

摘要

Presented herein are embodiments of state-of-the-art speech recognition systems developed using end-to-end deep learning. In embodiments, the model architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, embodiments of the system do not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learn a function that is robust to such effects. A phoneme dictionary, nor even the concept of a “phoneme,” is needed. Embodiments include a well-optimized recurrent neural network (RNN) training system that can use multiple GPUs, as well as a set of novel data synthesis techniques that allows for a large amount of varied data for training to be efficiently obtained. Embodiments of the system can also handle challenging noisy environments better than widely used, state-of-the-art commercial speech systems.

机译：本文介绍的是使用端到端深度学习开发的最新语音识别系统的实施例。在实施例中，模型体系结构比传统的语音系统简单得多，传统的语音系统依靠费力地设计的处理管线。当在嘈杂的环境中使用时，这些传统系统的性能也往往很差。相反，系统的实施例不需要手工设计的组件来对背景噪声，混响或说话者变化进行建模，而是直接学习对这种效果具有鲁棒性的功能。音素词典甚至不需要“音素”的概念。实施例包括可以使用多个GPU的充分优化的递归神经网络（RNN）训练系统，以及允许有效地获得大量用于训练的各种数据的一组新颖的数据合成技术。与广泛使用的最新技术的商业语音系统相比，该系统的实施例还可以更好地处理具有挑战性的嘈杂环境。

著录项

公开/公告号US2019371298A1

专利类型
公开/公告日2019-12-05

原文格式PDF
申请/专利权人 BAIDU USA LLC;
展开▼

申请/专利号US201916542243
发明设计人 AWNI HANNUN;CARL CASE;JARED CASPER;BRYAN CATANZARO;GREGORY DIAMOS;ERICH ELSEN;RYAN PRENGER;SANJEEV SATHEESH;SHUBHABRATA SENGUPTA;ADAM COATES;ANDREW NG;
展开▼

申请日2019-08-15
分类号G10L15/06;
国家 US
入库时间 2022-08-21 11:18:39

相似文献

专利
外文文献
中文文献