首页> 中文期刊>计算机技术与发展 >融合知识图谱和ESA方法的网络新词识别

融合知识图谱和ESA方法的网络新词识别

     

摘要

随着互联网的高速发展,微博、微信等文本形式的使用量逐渐增多,对于这类文本的分析理解在自然语言处理领域形成了新的挑战,尤其是文本中的网络新词识别与语义理解方面.为了克服传统方法无法识别网络新词及其语义的缺点,提出了一种融合知识图谱和显性语义分析(explicit semantic analysis,ESA)方法的网络新词识别方法.该方法以短语的粗粒度对原文进行切分来保留词语间的逻辑关系,利用百度知识图谱Schema匹配短语的语义表达后,再逐步以ESA方法分解剩余文本并将短语的百科信息提炼出核心语义词汇来补充Schema无法识别的部分.实验结果表明,与已有新词识别算法相比,该算法仅需要少量的语料库作为底层知识支撑,大幅降低了人工规则制订的成本,并提高了网络新词识别正确率与词语理解准确率.%With the rapid development of the Internet, the use of Weibo, WeChat and other text forms is gradually increasing. The analysis and understanding of such texts has posed new challenges in the field of natural language processing, especially in the field of network neologism recognition and semantic understanding. In order to overcome the shortcomings of traditional methods that cannot identify network neologism and their semantics, we propose a new method of network neologism recognition by combining knowledge map and explicit semantic analysis methods, which segments the original text with the coarse-grained phrase to preserve the logical relationship between the words. After using the semantic expression phrase of the Baidu knowledge map Schema, the ESA method is used to gradually decompose the remaining texts and extract the phrase encyclopedia information into the core semantic vocabulary, supplementing the unrecognized part of the Schema. Experiment shows that compared with the existing neologism recognition algorithms, the proposed algorithm requires only a small amount of corpus, which reduces the cost of manual rules formulation and improves the recognition of network neologism and the accuracy of word comprehension.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号