HMM-based large vocabulary speech reocgnition usually have a very large number of statistical parameters. For better estimation, the number of parameters is reduced by sharing them across models. The parameter sharing is decided by regression trees which are built using phonetic classes designed either by a human expert or by data-driven methods. In situations where neither of these are by data-drive methods. In situations where neither of these are reliable, it may be useful to have techniques for non-decision-tree based state tying which perform comparably to those based on traditional methods. In this paper we propose two methods for nondecision tree based parameer learning in HMM-based systems. In the first method (conetext-dependent state typing), we restructure acoustic models to explicitly capture the transitions between phones in continuous pseech. In the second method (transition-based subword units), we redefine the basic sound units used to mdoel speech to model transitions betwene sounds explicitly. Experiments show that context-dependent state typing is a viable option for large vocabulary systems. They also show that using transition-based subword units can improve performance on spontaneous speech.
展开▼