首页> 外国专利> A speech processing system that applies speaker adaptation techniques into an environment mismatch function

A speech processing system that applies speaker adaptation techniques into an environment mismatch function

机译:一种将说话人自适应技术应用于环境失配功能的语音处理系统

摘要

A speech processing method, comprising receiving a speech input which comprises a sequence of feature vectors and determining the likelihood of a sequence of words arising from the sequence of feature vectors using an acoustic model and a language model. This comprises providing an acoustic model for performing speech recognition on an input signal which comprises a sequence of feature vectors, said model having a plurality of model parameters relating to the probability distribution of a word or part thereof being related to a feature vector, wherein said speech input is a mismatched speech input which is received from a speaker in an environment which is not matched to the speaker or environment under which the acoustic model was trained and adapting the acoustic model to the mismatched speech input. The speech processing method further comprises determining the likelihood of a sequence of features occurring in a given language using a language model and combining the likelihoods determined by the acoustic model and the language model and outputting a sequence of words identified from said speech input signal. Adapting the acoustic model to the mismatched speaker input comprises relating speech from the mismatched speaker input to the speech used to train the acoustic model using a mismatch function f for primarily modelling differences between the environment of the speaker and the environment under which the acoustic model was trained and a speaker transform F for primarily modelling differences between the speaker of the mismatched speaker input, such that: y=f(F(x, v), u) where y represents the speech from the mismatched speaker input, x is the speech used to train the acoustic model, u represents at least one parameter for modelling changes in the environment and v represents at least one parameter used for mapping differences between speakers and jointly estimating u and v.
机译:一种语音处理方法,包括:接收包括特征向量序列的语音输入,并使用声学模型和语言模型来确定由特征向量序列产生的单词序列的可能性。这包括提供用于对包括一系列特征向量的输入信号执行语音识别的声学模型,所述模型具有与词或其一部分的概率分布有关的多个模型参数,所述模型参数与特征向量有关,其中语音输入是不匹配的语音输入,该语音输入是在与扬声器不匹配的环境中或在训练声学模型的环境中从扬声器接收到的,并且该声学模型使该音频模型适应不匹配的语音输入。语音处理方法还包括:使用语言模型确定在给定语言中出现的一系列特征的可能性,并将由声学模型和语言模型确定的可能性进行组合,并输出从所述语音输入信号中识别出的单词序列。使声学模型适应于失配的扬声器输入包括使用失配函数f将来自失配的扬声器输入的语音与用于训练声学模型的语音相关联,该函数主要用于建模扬声器环境与声学模型所处环境之间的差异。训练和说话人变换F,主要用于建模不匹配的说话人输入的说话人之间的差异,使得:y = f(F(x(v,v),u)其中y表示来自不匹配的说话人输入的语音,x是语音用于表示声学模型的参数,u表示用于建模环境变化的至少一个参数,v表示用于映射扬声器之间的差异并共同估计u和v的至少一个参数。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号