The statistical and syntactic approaches to the modelling oflanguage are consolidated in order to improve performance in speechrecognition. The authors also aim to minimise the need for humanintervention in the training of the language model from a corpus. Hybridspeech recognition systems using both bigram and grammar models canyield improved performance compared with the use of either model alone,but performance is still sub-optimal because the grammar is abandonedcompletely for sentences which fail to parse overall. Extending theconcept of a bigram to the most informative (rather than the immediate)previous word leads to a reduction in perplexity: a purely statisticalapproach is presented. Incorporating syntax from a substring parser willrequire these principles to be extended to strings of nonterminalsymbols, raising important training issues but opening the way towards alanguage model with greater capacity for adaptive enhancement ofperformance
展开▼