Large vocabulary continuous speech recognition (LVCSR) systems fail to recognized words beyond their vocabulary, many of which are information rich terms such as named entities, technical terms, or foreign words. Mis-recognizing these Out-of-Vocabulary (OOV) words can have a disproportionate impact in transcript coherence, and cause recognition failures which propagate through pipeline systems, impacting the performance of downstream applications. Ideally, a speech recognition system would be able to recognize arbitrary, even previously unseen, words.;This dissertation presents an approach to recover from failures caused by OOVs by automatically identifying when OOVs are spoken and transcribing them using sub-lexical units. This results in a hybrid word/sub-word system which predicts full-words for in-vocabulary terms and sub-lexical units for OOVs. We first present an approach to model OOVs using sub-lexical units automatically learned from data. The learned units are variable-length phone sequences, which are included in the recognizer's vocabulary and language model. Previous work heuristically creates the sub-word lexicon from phonetic representations of text using simple statistics to select common phone sequences. Instead, we propose a novel unsupervised approach to learn the sub-word lexicon optimized for a given task. This approach employs a log-linear model with overlapping features to learn multi-phone units obtained by segmenting the phonetic representation of a corpus.;OOV Detection is the task of identifying regions in the recognizer's output where out-of-vocabulary words were uttered. The detection of OOV regions is helpful to avoid error propagation to downstream applications such as machine translation, named entity recognition, and spoken document retrieval. We combine the proposed hybrid system with confidence based metrics to improve OOV detection performance. Previous work address OOV detection as a binary classification task, where each region is independently classified using local information. This dissertation treats this problem as a sequence labeling problem, and shows that (1) jointly predicting out-of-vocabulary regions, (2) including contextual information from each region, and (3) learning sub-lexical units optimized for this task, leads to substantial improvements with respect to state-of-the-an systems.;The resulting sub-word representation and OOV detector is helpful to recover the correct spelling of new words, resulting in an open-vocabulary system; and improves performance in downstream applications strongly affected by out-of-vocabulary terms, such as: spoken term detection and named entity recognition in speech.
展开▼