基于自学习的汉语开放域命名实体边界识别

付瑞吉; 秦兵; 刘挺

首页> 中文期刊> 《智能计算机与应用》 >基于自学习的汉语开放域命名实体边界识别

基于自学习的汉语开放域命名实体边界识别

AI论文写作 >>

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

命名实体识别是自然语言处理领域的一个重要任务，为许多上层应用提供支持。本文主要研究汉语开放域命名实体边界的识别。由于目前该任务尚缺乏训练语料，而人工标注语料的代价又太大，本文首先基于双语平行语料和英语句法分析器自动标注了一个汉语专有名词语料，另外基于汉语依存树库生成了一个名词复合短语语料，然后使用自学习方法将这两部分语料融合形成命名实体边界识别语料，同时训练边界识别模型。实验结果表明自学习的方法可以提高边界识别的准确率和召回率。%Named entity recognition is an important task in the domain of Natural Language Processing,which plays an im-portant role in many applications.This paper focuses on the boundary identification of Chinese open -domain named enti-ties.Because the shortage of training data and the huge cost of manual annotation,the paper proposes a self -training ap-proach to identify the boundaries of Chinese open -domain named entities in context.Due to the lack of training data,the paper firstly generates a large scale Chinese proper noun corpus based on parallel corpora,and also transforms a Chinese dependency tree bank to a noun compound training corpus.Subsequently,the paper proposes a self -training -based ap-proach to combine the two corpora and train a model to identify boundaries of named entities.The experiments show the proposed method can take full advantage of the two corpora and improve the performance of named entity boundary identifi-cation.

著录项

来源
《智能计算机与应用》 |2014年第4期|1-4,8|共5页
作者
付瑞吉; 秦兵; 刘挺;
展开▼
作者单位

哈尔滨工业大学计算机科学与技术学院;

哈尔滨 150001;

哈尔滨工业大学计算机科学与技术学院;

哈尔滨 150001;

哈尔滨工业大学计算机科学与技术学院;

哈尔滨 150001;

展开▼
原文格式 PDF
正文语种 chi
中图分类 TP391.12;
关键词
开放域命名实体识别; 自学习; 训练语料融合;

相似文献

中文文献
外文文献
专利

1. 基于自学习的汉语开放域命名实体边界识别 [J] . 付瑞吉 ,秦兵 ,刘挺 . 智能计算机与应用 . 2014,第004期
2. 主动学习与自学习的中文命名实体识别 [J] . 钟志农 ,刘方驰 ,吴烨 . 国防科技大学学报 . 2014,第004期
3. 基于含边界词性特征的中文命名实体识别 [J] . 邱莎 ,王付艳 ,申浩如 . 计算机工程 . 2012,第013期
4. 基于感知器的生物医学命名实体边界识别算法 [J] . 胡俊锋 ,陈浩 ,陈蓉 . 计算机应用 . 2007,第012期
5. 基于条件随机域的生物命名实体识别 [J] . 彭春艳 ,张晖 ,包玲玉 . 计算机工程 . 2009,第022期
6. 基于条件随机域的生物医学命名实体识别 [C] . 李彦鹏 ,杨志豪 ,林鸿飞 . 第三届学术计算语言学研讨会 . 2006
7. 开放域命名实体识别及其层次化类别获取 [A] . 付瑞吉 . 2014

基于自学习的汉语开放域命名实体边界识别

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅