The problem of data sparseness is considered as a common problem for Statistical Corpus based Word Sense Disambiguation (WSD) approaches [1]. Usually large amounts of data are required in the corpus to guarantee that all senses of an ambiguous word are presented. However, this is not easily achieved, especially for words that do not occur frequently in the training corpus. On the other hand, for languages that do not have large amount of digitized resources, the sparseness problem is even worse. In this paper, we present an unsupervised framework that first learns the relationships between the ambiguous and related words to reveal their most suitable senses based on a proposed mathematical model and a set of bilingual resources, including a non-aligned Portuguese-Chinese training corpus, a dictionary, and a sense inventory. For senses not found in the learning phase, bilingual examples from the dictionary and Singular Value Decomposition (SVD) [2] techniques are applied to overcome the sparseness problem. All the senses found are converted into a set of rules and stored in the Word Sense database for later use in disambiguation and translation process. Preliminary experiment results show an improvement of learning more senses in the sparseness environment with the use of the mentioned strategies.
展开▼