首页> 外文会议>International Conference on Neural Information Processing >Offensive Sentence Classification Using Character-Level CNN and Transfer Learning with Fake Sentences
【24h】

Offensive Sentence Classification Using Character-Level CNN and Transfer Learning with Fake Sentences

机译:使用字符级CNN和用假句子传输学习的攻击性句子分类

获取原文

摘要

There are two difficulties in classifying offensive sentences: One is the modifiability of offensive terms, and the other is the class imbalance which appears in general offensive corpus. Solving these problems, we propose a method of pre-training fake sentences generated as character-level to convolution layers preventing under-fitting from data shortage, and dealing with the data imbalance. We insert the offensive words to half of the randomly generated sentences, and train the convolution neural networks (CNN) with theses sentences and the labels of whether offensive word is included. We use the trained filter of CNN for training new CNN given original data, resulting in the increase of the amount of training data. We get higher F1-score with the proposed method than that without pre-training in three dataset of insult from kaggle, Bullying trace, and formspring.
机译:分类冒犯句子有两个困难:一个是进攻性的可修改性,另一个是普遍冒犯性语料库中出现的班级不平衡。解决这些问题,我们提出了一种预先训练作为字符级的假句子的方法,以阻止从数据短缺造成的卷积,并处理数据不平衡。我们将令人反感的单词插入到随机生成的句子的一半,并用这些句子训练卷积神经网络(CNN),以及是否包括冒犯词的标签。我们使用CNN的训练过滤器进行培训新的CNN给定原始数据,导致培训数据量的增加。我们以拟议的方法获得更高的F1分数,而不是在kaggle,欺凌跟踪和formspring中的三个数据集中进行预先训练。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号