A new neural tree network (NTN)-based speech recognition system is presented. NTN is a hierarchial classifier that combines the properties of decision trees and feed-forward neural networks. In the sub-word unit-based system, the NTNs model the sub-word speech segments, while the Viterbi algorithm is used for temporal alignment. Durational probability is associated with each sub-word NTN. An iterative algorithm is proposed for training the sub-word NTNs. The sub-word NTN models, as well as the subword segment boundaries within a vocabulary word, are re-estimated. Thus, the proposed system is a homogeneous neural network-based, sub-word unit-based, speech recognition system. Furthermore, embedded within this word model paradigm, multiple NTNs are trained for each subword segment and their output decisions are combined or fused to yield improved performance. The proposed discriminatory training-based system did not perform favourably as compared to a hidden Markov model-based system. The paradigm presented in this paper can be argued to represent a class of discriminatory training-based, homogeneous (versus hybrid), sub-word unit-based, speech recognition systems. Hence, the results reported here can be generalized to other similar systems.
展开▼