首页> 外文会议>International Symposium on Artificial Intelligence amp; Signal Processing >Semi-supervised learning for named entity recognition using weakly labeled training data
【24h】

Semi-supervised learning for named entity recognition using weakly labeled training data

机译:使用弱标签的训练数据进行半监督学习以进行命名实体识别

获取原文
获取原文并翻译 | 示例

摘要

The shortage of the annotated training data is still an important challenge to building many Natural Language Process (NLP) tasks such as Named Entity Recognition. NER requires a large amount of training data with a high degree of human supervision whereas there is not enough labeled data for every language. In this paper, we use an unlabeled bilingual corpora to extract useful features from transferring information from resource-rich language toward resource-poor language and by using these features and a small training data, make a NER supervised model. Then we utilize a graph-based semi-supervised learning method that trains a CRF-based supervised classifier using that labeled data and uses high-confidence predictions on the unlabeled data to expand the training set and improve efficiency of NER model with the new training set.
机译:注释培训数据的短缺仍然是建立许多自然语言过程(NLP)任务(例如命名实体识别)的重要挑战。 NER需要大量的培训数据,并且需要高度的人工监督,而每种语言的标签数据不足。在本文中,我们使用未标记的双语语料库,从将信息从资源丰富的语言向资源贫乏的语言传递的信息中提取有用的特征,并利用这些特征和少量的训练数据,建立NER监督模型。然后,我们使用基于图的半监督学习方法,该方法使用标记的数据训练基于CRF的监督分类器,并对未标记的数据使用高置信度预测来扩展训练集并通过新的训练集提高NER模型的效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号