Traditionally, word sense disambiguation(WSD) involves a different context classificationmodel for each individual word. Thispaper presents a weakly supervised learningapproach to WSD based on learning a wordindependent context pair classificationmodel. Statistical models are not trained forclassifying the word contexts, but for classifyinga pair of contexts, I.e. determining if apair of contexts of the same ambiguous wordrefers to the same or different senses. Usingthis approach, annotated corpus of a targetword A can be explored to disambiguatesenses of a different word B. Hence, only alimited amount of existing annotated corpusis required in order to disambiguate the entirevocabulary. In this research, maximum entropymodeling is used to train the word independentcontext pair classification model.Then based on the context pair classificationresults, clustering is performed on word mentionsextracted from a large raw corpus. Theresulting context clusters are mapped ontothe external thesaurus WordNet. This approachshows great flexibility to efficientlyintegrate heterogeneous knowledge sources,e.g. trigger words and parsing structures.Based on Senseval-3 Lexical Sample standards,this approach achieves state-of-the-artperformance in the unsupervised learningcategory, and performs comparably with thesupervised Na?ve Bayes system.
展开▼