首页> 外国专利> Estimating speaker-specific affine transforms for neural network based speech recognition systems

Estimating speaker-specific affine transforms for neural network based speech recognition systems

机译:为基于神经网络的语音识别系统估计说话者特定的仿射变换

摘要

Features are disclosed for estimating affine transforms in Log Filter-Bank Energy Space (“LFBE” space) in order to adapt artificial neural network-based acoustic models to a new speaker or environment. Neural network-based acoustic models may be trained using concatenated LFBEs as input features. The affine transform may be estimated by minimizing the least squares error between corresponding linear and bias transform parts for the resultant neural network feature vector and some standard speaker-specific feature vector obtained for a GMM-based acoustic model using constrained Maximum Likelihood Linear Regression (“cMLLR”) techniques. Alternatively, the affine transform may be estimated by minimizing the least squares error between the resultant transformed neural network feature and some standard speaker-specific feature obtained for a GMM-based acoustic model.
机译:公开了用于估计对数滤波器组能量空间(“ LFBE”空间)中的仿射变换的特征,以便使基于人工神经网络的声学模型适应新的说话者或环境。可以使用串联的LFBE作为输入特征来训练基于神经网络的声学模型。仿射变换可以通过使用约束的最大似然线性回归(对于基于GMM的声学模型所获得的一些神经网络特征向量和一些标准的说话人特定特征向量),通过最小化相应线性和偏置变换部分之间的最小二乘误差来估算, cMLLR”)技术。可替代地,可以通过最小化所得的经变换的神经网络特征与针对基于GMM的声学模型获得的一些标准的说话者特定特征之间的最小二乘误差来估计仿射变换。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号