首页> 外文期刊>Chinese Journal of Electronics >A Text Similarity Measurement Based on Semantic Fingerprint of Characteristic Phrases
【24h】

A Text Similarity Measurement Based on Semantic Fingerprint of Characteristic Phrases

机译:基于特征短语的语义指纹的文本相似度测量

获取原文
获取原文并翻译 | 示例
           

摘要

Text similarity measurements are the basis for measuring the degree of matching between two or more texts. Traditional large-scale similarity detection methods based on a digital fingerprint have the advantage of high detection speed, which are only suitable for accurate detection. We propose a method of Chinese text similarity measurement based on feature phrase semantics. Natural language processing (NLP) technology is used to pre-process text and extract the keywords by the Term frequency-Inverse document frequency (TF-IDF) model and further screen out the feature words. We get the exact meaning of a word and semantic similarities between words and a HowNet semantic dictionary. We substitute concepts to get the feature phrases and generate a semantic fingerprint and calculate similarity. The experimental results indicate that the method proposed is superior in similarity detection in terms of its accuracy rate, recall rate, and F-value to the traditional and digital fingerprinting method.
机译:文本相似度测量是测量两个或多个文本之间匹配程度的基础。基于数字指纹的传统大规模相似性检测方法具有高检测速度的优点,仅适用于精确检测。我们提出了一种基于特征短语语义的中文文本相似性测量方法。自然语言处理(NLP)技术用于预处理文本并由术语频率 - 逆文档频率(TF-IDF)模型提取关键字,并进一步屏蔽特征词。我们获得了单词和语义语义字典的单词和语义相似性的确切含义。我们替代概念来获取要素短语并生成语义指纹并计算相似度。实验结果表明,所提出的方法在其精度率,召回速率和F值方面的相似性检测优异,对传统和数字指纹法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号