The state-of-the-art language identification (LID) systems are based on phone recognisers and n-gram language models, which require the use of transcribed speech databases for training. An alternate solution to the LID problem directly applies mixed-order hidden Markov models (HMMs) to untranscribed speech. The competitive performance of these mixed-order HMMs on the NIST 1996 evaluation set is very promising, considering the ease of implementation and many possible improvements. This validates a novel mixed-order HMM training procedure and extends previous results obtained with high-order HMMs to take advantage of larger datasets.
展开▼