首页> 外文会议>International Conference on System Science and Engineering >A Word Similarity Feature-based Semi-supervised Approach for Named Entity Recognition
【24h】

A Word Similarity Feature-based Semi-supervised Approach for Named Entity Recognition

机译:基于词相似度特征的半监督命名实体识别方法

获取原文

摘要

Named Entity Recognition (NER) is an important branch of Natural Language Processing (NLP). Among the existed NER methods, one of the most advanced and commonly deployed approach is the Long Short Term Memory with a Conditional Random Field layer (LSTM-CRF). However, this supervised method generally requires a large number of labeled corpuses, which is very limited regarding the texts in drug patent of this study. Bearing this in mind, a word similarity feature-based semi-supervised NER approach is proposed in this study. The feature of word similarity with regard to various types of entities are firstly extracted from word embedding to form similarity constraint. Then they are combined with the features computed by supervised LSTM. Finally, the tagged results are obtained through the CRF layer. By introducing the similarity feature of word embedding to LSTM-CRF model, the proposed method can greatly reduce the untagged cases in a large amount of similar entities. Experimental studies demonstrated that the proposed method performs obvious advantages in both the accuracy and comprehensiveness when compared with the traditional baseline model and other semi-supervised methods.
机译:命名实体识别(NER)是自然语言处理(NLP)的重要分支。在现有的NER方法中,最先进且最常用的方法之一是带有条件随机字段层(LSTM-CRF)的长短期内存。但是,这种受监督的方法通常需要大量带标记的语料库,这对于本研究的药物专利中的文本而言是非常有限的。考虑到这一点,本研究提出了一种基于词相似度特征的半监督NER方法。首先从词嵌入中提取出针对各种类型实体的词相似度特征,以形成相似度约束。然后将它们与受监督的LSTM计算出的特征相结合。最后,通过CRF层获得标记的结果。通过将词嵌入的相似性特征引入LSTM-CRF模型,该方法可以大大减少大量相似实体中未加标签的情况。实验研究表明,与传统的基线模型和其他半监督方法相比,该方法在准确性和综合性上均具有明显的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号