首页> 中文期刊>电子技术应用 >一种面向微博文本的命名实体识别方法

一种面向微博文本的命名实体识别方法

     

摘要

命名实体识别是自然语言处理领域的一项基础性技术.近年来微博等网络社交平台发展迅速,其独特的形式对传统的命名实体识别技术提出了新的挑战.故提出一种基于条件随机场模型的改进方法,针对微博文本短小、语义含糊等特点,引入外部数据源提取主题特征和词向量特征来训练模型,针对微博数据规模大、人工标准化处理代价大的特点,采取一种基于最小置信度的主动学习算法,以较小的人工代价强化模型的训练效果.在新浪微博数据集上的实验证明,该方法与传统的条件随机场方法相比F值提高了4.54%.%Named entity recognition is a fundamental technology in natural language processing(NLP). In recent years, rapid development of social network platforms such as microblog presents new challenges to the traditional named entity recognition(NER) technology because of the unique form. In this paper, an improved method based on the conditional random field(CRF) model is proposed for microblog texts. Due to the short texts and semantic ambiguity, external data resources are introduced to generate the topic feature and word representation feature for training the model. Due to the large-scale of microblog data and the high cost of manual standardization, an active learning algorithm based on least confidence is adopted to enhance the training effect at a lower cost of labor. Experiments on a Sina weibo data set show that this method improves the F-score by 4.54% compared to the traditional CRF methods.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号