首页> 外文会议>CCF International Conference on Natural Language Processing and Chinese Computing >Applying Model Fusion to Augment Data for Entity Recognition in Legal Documents
【24h】

Applying Model Fusion to Augment Data for Entity Recognition in Legal Documents

机译:应用模型融合在法律文献中的实体识别中的增强数据

获取原文

摘要

Named entity recognition for legal documents is a basic and crucial task, which can provide important knowledge for the related tasks in the field of wisdom justice. However, it is still difficult to augment the labeled data of named entities for legal documents automatically. To address this issue, we propose a novel data augmentation method for named entity recognition by fusing multiple models. Firstly, we train a total often models by conducting 5-fold cross-training on the small-scale labeled datasets based on Bilstm-CRF and Bert-Bilstm-CRF models separately. Next, we try to apply single-model fusion and multi-model fusion modes, in which, single-model fusion is to vote on the prediction results of five models of the same baseline, while multi-model fusion is to vote on the prediction results of ten models with two different baselines. Further, we take the identified entities with high correctness in the multiple experimental results as effective entities, and add them to the training set for the next training. Finally, we conduct the different experiments on two public datasets and our built judicial dataset separately, which shows the experimental results using data augmentation are close to those based on 5 times of labeled dataset, and obviously better than those on the initial small-scale labeled datasets.
机译:为法律文件命名的实体认可是一个基本和重要的任务,可以为智慧司法领域的相关任务提供重要知识。但是,仍然难以自动增强名为实体的标记数据。为了解决这个问题,我们提出了一种通过融合多个模型来命名实体识别的新型数据增强方法。首先,我们通过单独基于Bilstm-CRF和BERT-Bilstm-CRF模型在小规模标记的数据集上进行5倍的交叉训练,我们训练总共训练。接下来,我们尝试应用单模融合和多模型融合模式,其中,单模型融合是对相同基线的五种型号的预测结果进行投票,而多模型融合是对预测投票具有两种不同基线的十个型号的结果。此外,我们将鉴定的实体带到多个实验结果中具有高正确性作为有效实体,并将其添加到下次培训的培训集中。最后,我们在两个公共数据集中进行不同的实验,并分别进行了建立的司法数据集,这表明使用数据增强的实验结果与基于标记数据集的5次的实验结果接近,并且显着比标记的初始小规模更好数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号