Natural language processing (NLP) offers significant potential for significantly enriching the communicative capabilities for a broad range of learning technologies. For example, both adaptive writing support environments and computer assisted learning environments could benefit from robust NLP. However, because texts created by novice writers pose significant challenges for core NLP systems such as syntactic and semantic parsers, robust grammatical pre-processing systems must be introduced upstream in the NLP pipeline. These challenges are exacerbated by the fact that current methods designed to detect and correct ungrammatical text focus on identifying and repairing specific types of errors, or rely heavily on contextual clues that may be unreliable in highly disfluent text.;To address these problems, we propose a noisy channel model implemented with weighted Finite State Transducers (wFSTs), where weights represent the probabilistic likelihood of transitioning between states, or in this case, words in a sentence. To construct our language model, we use a corpus of children's stories from Project Gutenberg. For the noise model, a corpus consisting of passages composed by middle school students obtained from corpus acquisition experiments is utilized. The EM algorithm identifies optimal a priori probabilities of encountering an erroneous form of a word. Preliminary results are encouraging and suggest that wFSTs offer significant promise for detecting and correcting texts exhibiting significant disfluency.
展开▼