传统的命名实体识别方法直接依靠大量的人工特征和专门的领域知识,解决了监督学习语料不足的问题,但设计人工特征和获取领域知识的代价昂贵.针对该问题,提出一种基于BLSTM (Bidirectional Long Short-Term Memory)的神经网络结构的命名实体识别方法.该方法不再直接依赖于人工特征和领域知识,而是利用基于上下文的词向量和基于字的词向量,前者表达命名实体的上下文信息,后者表达构成命名实体的前缀、后缀和领域信息;同时,利用标注序列中标签之间的相关性对BLSTM的代价函数进行约束,并将领域知识嵌入模型的代价函数中,进一步增强模型的识别能力.实验表明,所提方法的识别效果优于传统方法.%Traditional named entity recognition methods directly rely on plenty of hand-crafted features and special domain knowledge,and have resolved the problem that there are few supervised learning corpora which are available.But the costs of developing hand-crafted features and obtaining domain knowledge are expensive.To solve this problem,a neural network model based on BLSTM(Bidirectional Long Short-Term Memory) was proposed.This method does not directly use hand-crafted features and domain knowledge any more,but utilizes the word embedding based on context and word embedding based on characters.The former expresses the information about context of named entities,and the latter expresses the information about prefix,postfix and domain knowledge which make up the named entities.Simultaneously,it constrains the cost function of BLSTM by using the dependency between the labels in tagged sequence,and integrates the domain knowledge into the cost function,furtherly improving the recognition ability of the model.The experiments show that the recognition effect of the method in this paper is superior to traditional methods.
展开▼