首页> 外文期刊>Knowledge-Based Systems >A combination of active learning and self-learning for named entity recognition on Twitter using conditional random fields
【24h】

A combination of active learning and self-learning for named entity recognition on Twitter using conditional random fields

机译:主动学习和自学习的结合,使用条件随机字段在Twitter上进行命名实体识别

获取原文
获取原文并翻译 | 示例
           

摘要

In recent years, many applications in natural language processing (NLP) have been developed using the machine learning approach. Annotating data is an important task in applying machine learning to NLP applications. A common approach to improve the system performance is to train on a large and high-quality set of training data that is annotated by experts. Besides, active learning (AL) and self-learning can be utilized to reduce the annotation costs. The self-learning method discovers highly reliable instances based on a trained classifier, while AL queries the most informative instances based on active query algorithms. This paper proposes a method that combines AL and self-learning to reduce the labeling effort for the named entity recognition task from tweet streams by using both machine-labeled and manually-labeled data. We employ AL queries based on the diversity of the context and content of instances to select the most informative instances. The conditional random fields are also chosen as an underlying model to train a classifier for selecting highly reliable instances. The experiments using Twitter data show that the proposed method achieves good results in reducing the human labeling effort, and it can significantly improve the performance of the systems. (C) 2017 Elsevier B.V. All rights reserved.
机译:近年来,已经使用机器学习方法在自然语言处理(NLP)中开发了许多应用程序。注释数据是将机器学习应用于NLP应用程序的重要任务。改善系统性能的常用方法是在专家注释的大量高质量训练数据上进行训练。此外,可以利用主动学习(AL)和自我学习来减少注释成本。自学习方法基于训练有素的分类器发现高度可靠的实例,而AL根据主动查询算法查询信息量最大的实例。本文提出了一种结合AL和自学习的方法,以通过使用机器标记的数据和手动标记的数据来减少推文流中命名实体识别任务的标记工作量。我们基于上下文的上下文和实例内容的多样性来采用AL查询,以选择信息量最大的实例。还选择条件随机字段作为基础模型,以训练分类器来选择高度可靠的实例。使用Twitter数据进行的实验表明,该方法在减少人工标注工作方面取得了良好的效果,并且可以显着提高系统的性能。 (C)2017 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号