FlorUniTo@TRAC-2: Retrofitting Word Embeddings on an Abusive Lexicon for Aggressive Language Detection

机译：FlorUniTo @ TRAC-2：在攻击性词典上改进词嵌入以进行侵略性语言检测

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper describes our participation to the TRAC-2 Shared Tasks on Aggression Identification. Our team, FlorUniTo, investigated the applicability of using an abusive lexicon to enhance word embeddings towards improving detection of aggressive language. The embeddings used in our paper are word-aligned pre-trained vectors for English, Hindi, and Bengali, to reflect the languages represented in the shared task datasets. The embeddings are retrofitted to a multilingual abusive lexicon, HurtLex. We experimented with an LSTM model using the original as well as the transformed embeddings and different language and setting variations. Overall, our systems placed toward the middle of the official rankings based on weighted Fl score. Furthermore, the results on the development and test sets show promise for this novel avenue of research.

机译：本文描述了我们对TRAC-2攻击识别共享任务的参与。我们的团队FlorUniTo调查了使用辱骂词典来增强单词嵌入以改善对攻击性语言的检测的适用性。本文中使用的嵌入是针对英语，北印度语和孟加拉语的单词对齐的预训练向量，以反映共享任务数据集中表示的语言。嵌入内容被改编为多语言辱骂性词典HurtLex。我们使用原始，转换后的嵌入以及不同的语言和设置变化对LSTM模型进行了实验。总体而言，我们的系统根据加权Fl得分排在官方排名的中间。此外，开发和测试集上的结果显示出了这种新颖的研究途径的希望。

著录项

来源
《Workshop on Trolling, Aggression and Cyberbullying》|2020年|106-112|共7页
会议地点
作者
Anna Koufakou; Valerio Basile; Viviana Patti;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
embeddings; retrofitting; abusive lexicon;

机译：嵌入翻新;辱骂词典;

相似文献

外文文献
中文文献
专利

1. Enhancing Contextualised Language Models with Static Character and Word Embeddings for Emotional Intensity and Sentiment Strength Detection in Arabic Tweets [J] . Abdullah I. Alharbi, Phillip Smith, Mark Lee Procedia Computer Science . 2021,第a期

机译：增强具有静态字符和Word Embeddings的语境化语言模型，用于阿拉伯语推文中的情绪强度和情绪强度检测
2. On the use of word embedding for cross language plagiarism detection [J] . Asghari Habibollah, Fatemi Omid, Mohtaj Salar, Intelligent data analysis . 2019,第3期

机译：关于单词嵌入在跨语言窃检测中的应用
3. On the use of word embedding for cross language plagiarism detection [J] . Asghari Habibollah, Fatemi Omid, Mohtaj Salar, Intelligent data analysis . 2019,第3期

机译：关于嵌入跨语言抄袭检测的单词
4. Lexicon-Enhancement of Embedding-based Approaches Towards the Detection of Abusive Language [C] . Anna Koufakou, Jason Scott Workshop on Trolling, Aggression and Cyberbullying . 2020

机译：基于嵌入的词典增强方法，用于检测辱骂性语言
5. Parallel Sentence Detection in Comparable Corpora with Bilingual Word Embeddings for Low-Resource Languages [D] . Cadigan, John. 2018

机译：与低资源语言的双语单词嵌入式的同类语料中的并行句子检测
6. Generating a lexicon without a language model: Do words for number count? [O] . Elizabet Spaepen, Marie Coppola, Molly Flaherty, -1

机译：生成没有语言模型的词典：数字单词是否计数？
7. Lexicon-Based Indonesian Local Language Abusive Words Dictionary to Detect Hate Speech in Social Media [O] . Mardhiya Hayaty, Sumarni Adi, Anggit Dwi Hartanto 2020

机译：基于词汇的印度尼西亚本地语言滥用单词字典以检测社交媒体中的仇恨讲话

FlorUniTo@TRAC-2: Retrofitting Word Embeddings on an Abusive Lexicon for Aggressive Language Detection

摘要

著录项

相似文献

相关主题

期刊订阅