When generating the concept vector as the meaning representation for the word, we propose a method that the co-occurrences between all words can be considered by allocating a random, unique number set to each word and generating the co-occurrence matrix between words and numbers. The method has the feature that the memory usage for generating and using the concept vectors doesn't increase though information on the co-occurrences between all words is contained. We also propose a method that word concept vectors generated thus are clustered, and the number of the cluster generated as a result is allocated to each word, and then the co-occurrence matrix between words and clusters is generated and united with the co-occurrence matrix between words and numbers. When the accuracy of various linguistic processing was measured by using the concept vector generated with these methods, we confirmed the effectiveness of our method compared with the conventional method.%単語の意味表現としての概念ベクトルの生成方式として,各単語にランダムでかつユニークな番号集合を振り,単語・番号間共起行列をとることにより,全単語間の共起を考慮できる方式を提案する.提案方式は,全単語間共起の情報を含みながら,概念ベクトル生成及び使用におけるメモリ使用土が増えることはないという特徴をもつ.また,こうして生成した単語概念ベクトルをクラスタリングし,その結果できたクラスタの番号を各単語に振り,単語・クラスタ間共起行列をとって,単語・番号間共起行列と結合させる方式も提案する.これらの方式により生成した概念ベクトルを使用して,各種言語処理の精度を測定したところ,従来の概念ベクトル生成方式で生成した概念ベクトルを使用するよりも精度が向上することを確認した.
展开▼