首页> 外文会议>IEEE International Conference on Big Data >Japanese Mistakable Legal Term Correction using Infrequency-aware BERT Classifier
【24h】

Japanese Mistakable Legal Term Correction using Infrequency-aware BERT Classifier

机译:使用不频敏的BERT分类器进行日语易混淆的法律术语更正

获取原文

摘要

We propose a method that assists legislative drafters in locating inappropriate legal terms in Japanese statutory sentences and suggests corrections. We focus on sets of mistakable legal terms whose usages are defined in legislation drafting rules. Our method predicts suitable legal terms using a classifier based on a BERT (Bidirectional Encoder Representations from Transformers) model. We apply three techniques in training the BERT classifier, specifically, preliminary domain adaptation, repetitive soft undersampling, and classifier unification. These techniques cope with two levels of infrequency: legal term-level infrequency that causes class imbalance and legal term set-level infrequency that causes underfitting. Concretely, preliminary domain adaptation improves overall performance by providing prior knowledge of statutory sentences, repetitive soft undersampling improves performance on infrequent legal terms without sacrificing performance on frequent legal terms, and classifier unification improves performance on infrequent legal term sets by sharing common knowledge among legal term sets. Our experiments show that our classifier outperforms conventional classifiers using Random Forest or a language model, and that all three training techniques contribute to performance improvement.
机译:我们提出了一种方法,可以帮助立法起草者在日本法定刑罚中找到不适当的法律用语,并提出更正建议。我们专注于法律法规起草规则中定义的一组易混淆的法律术语。我们的方法使用基于BERT(来自变压器的双向编码器表示)模型的分类器来预测合适的法律条款。在训练BERT分类器时,我们应用了三种技术,即,初步的域自适应,重复的软欠采样和分类器统一。这些技术可以应对两个级别的不频繁:导致类不平衡的合法术语级别不频繁和导致不合身的合法术语集级别不频繁。具体而言,通过提供对法定语句的先验知识,初步领域适应可以提高整体性能;重复进行的软欠采样可以提高不频繁使用的法律术语的性能,而不会牺牲频繁使用的法律术语的性能;分类器统一通过在法律术语之间共享常识来提高不经常使用的法律术语集的性能。套。我们的实验表明,我们的分类器优于使用随机森林或语言模型的传统分类器,并且所有三种训练技术都有助于提高性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号