Spoken language systems often rely on static speech recognizers. When the underlying models are improved on-the-fly, training is usually performed using unsupervised methods. In this work, we explore an alternative approach that uses human computation to provide crowd-supervised training of a deployed system. Although the framework we describe is applicable to any stochastic model for which the training data can be generated by non-experts, we demonstrate its utility on the lexicon and language model of a speech recognizer in a cinema voice-search domain. We show how an initially shaky system can achieve over a 10% absolute improvement in word error rate (WER) - entirely without expert intervention. We then analyze how these gains were made.
展开▼