首页> 外国专利> Method for recognizing network text named entity based on neural network probability disambiguation

Method for recognizing network text named entity based on neural network probability disambiguation

机译:基于神经网络概率消歧的网络文本命名实体识别方法

摘要

A method for recognizing network text named entity based on neural network probability disambiguation. The method comprises: carrying out word segmentation on an unlabeled corpus, using Word2Vec to extract a word vector; converting a sample corpus into a word feature matrix and windowing same; building a deep neural network to carry out training, and adding a softmax function into an output layer of the neural network to carry out normalization processing, so as to obtain a probability matrix of the named entity category corresponding to each word; and re-windowing the probability matrix, and using a conditional random field model to carry out disambiguation, so as to obtain a final named entity annotation. In a named entity recognition task of network text, a word vector increment learning method without changing the structure of a neural network is provided, according to the characteristic that a network vocabulary and a new vocabulary exist therein, and a probability disambiguation method is used in order to deal with the problems of a nonstandard grammatical structure and many wrongly written characters in the network text. Therefore, higher accuracy can be produced.
机译:一种基于神经网络概率消歧的网络文本命名实体识别方法。该方法包括:使用Word2Vec提取未标记的语料库中的单词分割词;将样本语料库转换为单词特征矩阵并对其进行窗口化;建立一个深度神经网络进行训练,并在该神经网络的输出层增加一个softmax函数进行归一化处理,以获得每个单词对应的命名实体类别的概率矩阵;重新显示概率矩阵,并使用条件随机场模型进行消歧,以获得最终的命名实体注释。在网络文本的命名实体识别任务中,根据其中存在网络词汇和新词汇的特点,提供一种不改变神经网络结构的词向量增量学习方法,并采用概率歧义消除方法。为了解决语法结构不标准和网络文本中许多错误书写的字符的问题。因此,可以产生更高的精度。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号