In information retrieval, Latent Semantic Analysis (LSA) is a method to handle large and sparse document vectors. LSA reduces the dimension of document vectors by producing a set of topics related to the documents and terms statistically. Therefore, it needs a certain number of words and takes no account of semantic relations of words. In this paper, by clustering the words using semantic distances of words, the dimension of document vectors is reduced to the number of word-clusters. Word distance is able to be calculated by using WordNet. This method is free from the amount of words and documents. For especially small documents, we use word's definition in a dictionary and calculate the similarities between documents.
展开▼