...
首页> 外文期刊>Intelligent decision technologies >Uncertainty query sampling strategies for active learning of named entity recognition task
【24h】

Uncertainty query sampling strategies for active learning of named entity recognition task

机译:不确定性查询名称实体识别任务的主动学习的采样策略

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Active learning approach is well known method for labeling huge un-annotated dataset requiring minimal effort and is conducted in a cost efficient way. This approach selects and adds most informative instances to the training set iteratively such that the performance of learner improves with each iteration. Named entity recognition (NER) is a key task for information extraction in which entities present in sequences are labeled with correct class. The traditional query sampling strategies for the active learning only considers the final probability value of the model to select the most informative instances. In this paper, we have proposed a new active learning algorithm based on the hybrid query sampling strategy which also considers the sentence similarity along with the final probability value of the model and compared them with four other well known pool based uncertainty query sampling strategies based active learning approaches for named entity recognition (NER) i.e. least confident sampling, margin of confidence sampling, ratio of confidence sampling and entropy query sampling strategies. The experiments have been performed over three different biomedical NER datasets of different domains and a Spanish language NER dataset. We found that all the above approaches are able to reach to the performance of supervised learning based approach with much less annotated data requirement for training in comparison to that of supervised approach. The proposed active learning algorithm performs well and further reduces the annotation cost in comparison to the other sampling strategies based active algorithm in most of the cases.
机译:主动学习方法是众所周知的方法,用于标记需要最小努力的巨大未注释的数据集,并以成本有效的方式进行。这种方法选择并为训练中的大多数信息实例进行了迭代地设置,使得学习者的性能随着每次迭代而改善。命名实体识别(ner)是信息提取的关键任务,其中序列中存在的实体用正确的类标记。活动学习的传统查询采样策略仅考虑模型的最终概率值,以选择最具信息性的实例。在本文中,我们提出了一种基于混合查询采样策略的新的主动学习算法,该算法还考虑了模型的最终概率值以及与基于四个基于众所周知的池的不确定性查询查询采样策略的主动策略学习指定实体识别(NER)的方法,即最不自信的采样,置信范围,置信范围,置信比例和熵查询采样策略。在不同域的三种不同的生物医学网数据集和西班牙语语言数据集中进行了实验。我们发现,上述所有方法都能够达到基于监督的学习方法的性能,与监督方法相比,培训的培训较少的批量数据要求。所提出的主动学习算法表现良好,并进一步降低了与大多数情况下的其他采样策略的基于采样策略的注释成本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号