首页> 外国专利> Estimating speaker-specific affine transforms for neural network based speech recognition systems

Estimating speaker-specific affine transforms for neural network based speech recognition systems

机译：为基于神经网络的语音识别系统估计说话者特定的仿射变换

页面导航

摘要
著录项
相似文献

摘要

Features are disclosed for estimating affine transforms in Log Filter-Bank Energy Space (“LFBE” space) in order to adapt artificial neural network-based acoustic models to a new speaker or environment. Neural network-based acoustic models may be trained using concatenated LFBEs as input features. The affine transform may be estimated by minimizing the least squares error between corresponding linear and bias transform parts for the resultant neural network feature vector and some standard speaker-specific feature vector obtained for a GMM-based acoustic model using constrained Maximum Likelihood Linear Regression (“cMLLR”) techniques. Alternatively, the affine transform may be estimated by minimizing the least squares error between the resultant transformed neural network feature and some standard speaker-specific feature obtained for a GMM-based acoustic model.

机译：公开了用于估计对数滤波器组能量空间（“ LFBE”空间）中的仿射变换的特征，以便使基于人工神经网络的声学模型适应新的说话者或环境。可以使用串联的LFBE作为输入特征来训练基于神经网络的声学模型。仿射变换可以通过使用约束的最大似然线性回归（对于基于GMM的声学模型所获得的一些神经网络特征向量和一些标准的说话人特定特征向量），通过最小化相应线性和偏置变换部分之间的最小二乘误差来估算， cMLLR”）技术。可替代地，可以通过最小化所得的经变换的神经网络特征与针对基于GMM的声学模型获得的一些标准的说话者特定特征之间的最小二乘误差来估计仿射变换。

著录项

公开/公告号US9378735B1

专利类型
公开/公告日2016-06-28

原文格式PDF
申请/专利权人 AMAZON TECHNOLOGIES INC.;
展开▼

申请/专利号US201314135474
发明设计人 SRI VENKATA SURYA SIVA RAMA KRISHNA GARIMELLA;NIKKO STROM;BJORN HOFFMEISTER;
展开▼

申请日2013-12-19
分类号G10L15/16;G10L15/06;G10L15/20;G10L13/08;
国家 US
入库时间 2022-08-21 14:29:31

相似文献

专利
外文文献
中文文献