Most previous corpus-based approaches to word-sense disambiguation (WSD) collect salient words from the context of a target word. However, they suffer from the problem of data sparseness. To overcome the problem, this paper proposes a concept-based WSD method that uses an automatically generated sense-tagged corpus. Grammatical similarities between Korean and Japanese enable the construction of a sense-tagged Korean corpus through an existing high-quality Japanese-to-Korean machine translation system. The sense-tagged corpus can serve as a knowledge source to extract useful clues for word sense disambiguation, such as concept co-occurrence information. The proposed WSD model is applied to a Ko-rean-to-Japanese MT system that experimented with various. machine learning methods. In an evaluation, a weighted voting model achieved the best average precision of 81.50%, with an improvement over the baseline by 18.75%, which shows that our proposed method is very promising for practical MT systems.
展开▼