A method includes performing, using at least one processor, feature extraction of input audio data to identify extracted features associated with the input audio data. The method also includes detecting, using the at least one processor, a language associated with each of multiple portions of the input audio data by processing the extracted features using a plurality of language models, where each language model is associated with a different language. In addition, the method includes directing, using the at least one processor, each portion of the input audio data to one of a plurality of automatic speech recognition (ASR) models based on the language associated with the portion of the input audio data.
展开▼