Current automatic speech recognition (ASR) research is focused on recognition of continuous,udspontaneous speech. Spontaneous speech contains a lot of variability in theudway words are pronounced, and canonical pronunciations of each word are not true toudthe variation that is seen in real data.udTwo of the components of an ASR system are acoustic models and pronunciationudmodels. The variation within spontaneous speech must be accounted for by theseudcomponents. Phones, or context-dependent phones are typically used as the base subwordudunit, and one acoustic model is trained for each sub-word unit. Pronunciationudmodelling largely takes place in a dictionary, which relates words to sequences of phones.udAcoustic modelling and pronunciation modelling overlap, and the two are not clearlyudseparable in modelling pronunciation variation. Techniques that find pronunciationudvariants in the data and then reflect these in the dictionary have not provided expectedudgains in recognition.udAn alternative approach to modelling pronunciations in terms of phones is to deriveudunits automatically: using data-driven methods to determine an inventory of sub-wordudunits, their acoustic models, and their relationship to words. This thesis presents audmethod for the automatic derivation of a sub-word unit inventory, whose main componentsudareud1. automatic and simultaneous generation of a sub-word unit inventory and acousticudmodel set, using an ergodic hidden Markov model whose complexity is controlledudusing the Bayesian Information Criterionud2. automatic generation of probabilistic dictionaries using joint multigramsudThe prerequisites of this approach are fewer than in previous work on unit derivation;udnotably, the timings of word boundaries are not required here. The approach is languageudindependent since it is entirely data-driven and no linguistic information is required.udThe dictionary generation method outperforms a supervised method using phoneticuddata. The automatically derived units and dictionary perform reasonably on a smalludspontaneous speech task, although not yet outperforming phones.
展开▼