In this paper we describe procedures for combining multiple acoustic models, obtained using traming corpora from different languages, in order to improve ASR performance i nlanguages for which large amounts of training data are not available. We treat these mdoels as multiple sources of information whose scores are conbined in a log-linear model to compute the hypothesis likelihood. The model combination can either be performed in a static way, with constant combination weights, or in a dynamic way, with parameters that can vary for different segments of a hypothesis. The aim is to optimize the parameters so as to achieve minimum word error rate. In order to achieve robust parameter estiamtion in the dynamic combination case, the parameters are defined to be piecewise constant on different phonetic classes that form a partition of the space of hypothesis segments. The partition is defined, using phonological knowledge, on segments that correspond to hypothesized phones. We examien different ways to define such a partition, including an automatic approach that gives a binary tree structured partion which tries to achieve the minimum WER with the minimum number of classes.
展开▼