首页> 外国专利> TEXT INFORMATION SIMILARITY MATCHING METHOD AND APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM

TEXT INFORMATION SIMILARITY MATCHING METHOD AND APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM

机译:文本信息相似性匹配方法和装置,计算机设备和存储介质

摘要

Provided are a TF-IDF-based text information similarity matching method and apparatus. The method comprises: acquiring text information; carrying out word segmentation on the text information to obtain segmented words w1, w2,..., wn-1 and wn; using a CBOW model to calculate word vectors V(w1), V(w2),..., V(wn-1) and V(wn) of the segmented words; using a TF-IDF algorithm to calculate TF-IDF values k1, k2,..., kn-1 and kn of the segmented words; obtaining a sentence vector V according to products of the word vectors of the segmented words and the corresponding TF-IDF values; and calculating the cosine similarity between the sentence vector V and sentence vectors of pre-stored statements, and determining a pre-stored statement having the maximum cosine similarity. By means of the process, a pre-stored statement that is most similar to text information can be found, and the accuracy of problem recognition can be improved in the aspects of robot conversation, information classification, etc., thus improving the conversation efficiency or the classification efficiency. Further provided are a computer device and a storage medium.
机译:提供了一种基于TF-IDF的文本信息相似度匹配方法及装置。该方法包括:获取文本信息;对文本信息进行分词以获得分词w 1 ,w 2 ,...,w n-1 和w < Sub> n ;使用CBOW模型计算单词向量V(w 1 ),V(w 2 ),...,V(w n-1 )和分段词的V(​​w n );使用TF-IDF算法计算TF-IDF值k 1 ,k 2 ,...,k n-1 和k <分段词的Sub> n ;根据分割后的词的词向量与相应的TF-IDF值的乘积获得句子向量V;计算所述句子矢量V与所述预存语句的句子矢量之间的余弦相似度,并确定具有最大余弦相似度的预存储语句。通过该过程,可以找到与文本信息最相似的预存储语句,可以在机器人对话,信息分类等方面提高问题识别的准确性,从而提高对话效率或分类效率。还提供了一种计算机设备和存储介质。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号