Word2Vec是谷歌在2013年开源的一款语言处理工具包,它能够在基于神经网络训练语言模型的同时将词表示成实数值向量,并根据向量空间余弦距离来寻找语义相似度高的词,训练效率较高.在应用Word2Vec训练词向量的过程中,对其中可能影响Word2Vec词向量训练的中文分词和算法选择环节进行试验,配合深入解析部分核心源代码,发现能使训练效果最优的策略,使得Word2Vec的性能获得一定的提升,为下一步的应用提供了更好的词向量.%Word2Vec is a language process toolkit which was outsourced by Google in 2013.It can express words as real numbers based on neural network training language model,and find the words with high similarity in terms of vector cosine distance,and the training efficiency is higher.In this paper,we applied Word2Vec to find the optimized scheme in the task about training word vector with different segmented tools and mixed algorithms,meanwhile analysed architecture of the source code.Empirical results showed some factors and scheme which could improve the training performance,provided the higher quality word vectors for more applications.
展开▼