由于中文词语缺乏明确的边界和大小写特征,单字在不同词语下的意思也不尽相同,较于英文,中文命名实体识别显得更加困难.该文利用词向量的特点,提出了一种用于深度学习框架的字词联合方法,将字特征和词特征统一地结合起来,它弥补了词特征分词错误蔓延和字典稀疏的不足,也改善了字特征因固定窗口大小导致的上下文缺失.在词特征中加入词性信息后,进一步提高了系统的性能.在1998年《人民日报》语料上的实验结果表明,该方法达到了良好的效果,在地名 、人名 、机构名识别任务上分别提高1.6%、8%、3%,加入词性特征的字词联合方法的F1值可以达到96.8%、94.6%、88.6%.%Chinese NER is challenged by the implicit word boundary ,lack of capitalization ,and the polysemy of a single character in different words .This paper proposes a novel character-word joint encoding method in a deep learning framework for Chinese NER .It decreases the effect of improper word segmentation and sparse word dic-tionary in word-only embedding ,while improves the results in character-only embedding of context missing .Experi-ments on the corpus of the Chinese Peoples'Daily Newspaper in 1998 demonstrates a good results :at least 1 .6% , 8% and 3% improvements ,respectively ,in location ,person and organization recognition tasks compared with char-acter or word features ;and 96 .8% ,94 .6% ,88 .6% in F1 ,respectively ,on location ,person and organization rec-ognition tasks if integrated with part of speech feature .
展开▼