We describe the Multinet speech classifier architecture. This consists of a framework for combining specialised phone detection networks into a posterior probability estimator for all phones. We explain how individual nets may be trained on different input data representation and time-scales, and yet how their outputs may be combined in a consistent and meaningful manner. We give results showing the benefits of such a division of the classification problem by looking at the performance of the architecture on plosives.
展开▼