This chapter describes an unsupervised approach for natural language disambiguation, applicable to ambiguity problems where classes of equivalence can be defined over the set of words in a lexicon. Lexical knowledge is induced from non-ambiguous words via classes of equivalence and enables the automatic generation of annotated corpora. The only requirements are a lexicon and a raw textual corpus. The method was tested on two natural language ambiguity tasks in several languages: part of speech tagging (English, Swedish, Chinese) and word sense disambiguation (English, Romanian). Classifiers trained on automatically constructed corpora were found to have a performance comparable with classifiers that learn from expensive manually annotated data.
展开▼