In an example embodiment, there is disclosed herein an automatic speech recognition (ASR) system that employs speaker clustering (or speaker type) for transcribing audio. A large corpus of audio with corresponding transcripts is analyzed to determine a plurality of speaker types (e.g., dialects). The ASR system is trained for each speaker type. Upon encountering a new user, the ASR system attempts to map the user to a speaker type. After the new user is mapped to a speaker type, the ASR employs the speaker type for transcribing audio from the new user.
展开▼