A modified hierarchical mixtures of experts (HME) architecture is presented for text-dependent speaker identification. A new gating network is introduced to the original HME architecture for the use of instantaneous and transitional spectral information in text-dependent speaker identification. The statistical model underlying the proposed architecture is presented and learning is treated as a maximum likelihood problem; in particular, an expectation-maximization (EM) algorithm is also proposed for adjusting the parameters of the proposed architecture. An evaluation has been carried out using a database of isolated digit utterances by 10 male speakers. Experimental results demonstrate that the proposed architecture outperforms the original HME architecture in text-dependent speaker identification.
展开▼