首页>
外国专利>
A speech processing system that applies speaker adaptation techniques into an environment mismatch function
A speech processing system that applies speaker adaptation techniques into an environment mismatch function
展开▼
机译:一种将说话人自适应技术应用于环境失配功能的语音处理系统
展开▼
页面导航
摘要
著录项
相似文献
摘要
A speech processing method, comprising receiving a speech input which comprises a sequence of feature vectors and determining the likelihood of a sequence of words arising from the sequence of feature vectors using an acoustic model and a language model. This comprises providing an acoustic model for performing speech recognition on an input signal which comprises a sequence of feature vectors, said model having a plurality of model parameters relating to the probability distribution of a word or part thereof being related to a feature vector, wherein said speech input is a mismatched speech input which is received from a speaker in an environment which is not matched to the speaker or environment under which the acoustic model was trained and adapting the acoustic model to the mismatched speech input. The speech processing method further comprises determining the likelihood of a sequence of features occurring in a given language using a language model and combining the likelihoods determined by the acoustic model and the language model and outputting a sequence of words identified from said speech input signal. Adapting the acoustic model to the mismatched speaker input comprises relating speech from the mismatched speaker input to the speech used to train the acoustic model using a mismatch function f for primarily modelling differences between the environment of the speaker and the environment under which the acoustic model was trained and a speaker transform F for primarily modelling differences between the speaker of the mismatched speaker input, such that: y=f(F(x, v), u) where y represents the speech from the mismatched speaker input, x is the speech used to train the acoustic model, u represents at least one parameter for modelling changes in the environment and v represents at least one parameter used for mapping differences between speakers and jointly estimating u and v.
展开▼