首页> 外文会议>Conference on empirical methods in natural language processing >Interpreting Neural Network Hate Speech Classifiers
【24h】

Interpreting Neural Network Hate Speech Classifiers

机译:解释神经网络讨厌语音分类器

获取原文

摘要

Deep neural networks have been applied to hate speech detection with apparent success, but they have limited practical applicability without transparency into the predictions they make. In this paper, we perform several experiments to visualize and understand a state-of-the-art neural network classifier for hate speech (Zhang et al., 2018). We adapt techniques from computer vision to visualize sensitive regions of the input stimuli and identify the features learned by individual neurons. We also introduce a method to discover the keywords that are most predictive of hate speech. Our analyses explain the aspects of neural networks that work well and point out areas for further improvement.
机译:深度神经网络已经应用于讨厌讲话检测,表观取得了明显的成功,但它们具有有限的实际适用性,而无需透明地进入他们所做的预测。在本文中,我们执行几个实验来可视化和理解仇恨语音的最先进的神经网络分类器(Zhang等,2018)。我们根据计算机视觉调整技术,以可视化输入刺激的敏感区域,并确定各个神经元学到的特征。我们还介绍了一种发现最讨论仇恨语音的关键字的方法。我们的分析解释了神经网络的各个方面,并指出了进一步改进的领域。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号