【24h】

The Thin Line Between Hate and Profanity

机译:仇恨与亵渎之间的细线

获取原文

摘要

Hate speech can be defined as a language used to demean people within a specific group. Hate speech often contains explicitly profane words, however, the presence of these words does not always mean that the text instance is hateful. In some cases, text instances with profane words are just offensive language and they do not target any specific group, and so cannot be classified as hate speech. In this work, we build on existing studies to find a better demarcation between hate speech and offensive language. Our main contribution is to introduce the use of typed dependency as new features in our feature set. This new feature enables us to consider the relationship between long distance words in a text instance, thereby provides more identifying information than single word-based features. We evaluate our approach using a dataset with the classes: hate, offensive and neither. Comparing our work with existing studies, our feature set is much smaller but we achieve better accuracy and show comparable results in further analysis. Our detailed analysis also showed instances missed by the lexical features that were correctly predicted by the proposed feature set.
机译:讨厌的演讲可以被定义为用于侦探特定组内的人的语言。讨厌讲话通常包含明确的亵渎单词,但是,这些词的存在并不总是意味着文本实例是可恶的。在某些情况下,具有亵渎单词的文本实例只是令人反感的语言,并且他们没有针对任何特定的群体,因此不能被归类为仇恨言论。在这项工作中,我们建立了现有的研究,以找到仇恨言论和令人反感的语言之间更好的划分。我们的主要贡献是在我们的功能集中介绍键入的依赖项的使用。此新功能使我们能够考虑文本实例中的长距离单词之间的关系,从而提供比基于单词的特征更多的识别信息。我们使用与课程的数据集进行评估:讨厌,冒犯,既不。与现有研究的工作相比,我们的功能集要小得多,但我们达到了更好的准确性,并在进一步分析中显示了可比的结果。我们的详细分析还显示了所提出的功能集正确预测的词汇特征的情况。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号