Language models are used extensively in state-of-the-art speechrecognition systems to help determine the probability of a hypothesizedword sequence. These probabilities, along with the acoustic modelscores, allow the system to constrain the search space duringrecognition to only those word sequences that have a reasonable chanceof being correct. In order to determine these probabilities, knowledgeof the entire problem space is necessary. However, in speechrecognition, this is an unreasonable if not impossible task, especiallywhen one is using the SWITCHBOARD corpus (a large corpus consisting ofover 240 hours of recorded telephone conversations totaling almost 3million words of text). Many statistical and rule-based approaches havebeen applied to this problem in order to arrive at a language model thatproduces the minimal word error rate (WER) of the recognizer. Onetechnique includes part-of-speech (POS) information in the languagemodel. This paper discusses the task of tagging the SWITCHBOARD corpuswith POS information in the usual manner, and the problems encounteredwhen trying to conform conversational speech to these tags
展开▼