首页> 外文会议>International Workshop on Semantic Evaluation >JCT at SemEval-2020 Task 12: Offensive Language Detection in Tweets using Preprocessing Methods, Character and Word N-grams

【24h】

JCT at SemEval-2020 Task 12: Offensive Language Detection in Tweets using Preprocessing Methods, Character and Word N-grams

机译：JCT在Semeval-2020任务12：使用预处理方法，字符和单词n-grams的推文中的攻击性语言检测

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we describe our submissions to SemEval-2020 contest. We tackled subtask 12 -"Multilingual Offensive Language Identification in Social Media". We developed different models for four languages: Arabic, Danish, Greek, and Turkish. We applied three supervised machine learning methods using various combinations of character and word n-gram features. In addition, we applied various combinations of basic preprocessing methods. Our best submission was a model we built for offensive language identification in Danish using Random Forest. This model was ranked at the 6th position out of 39 submissions. Our result is lower by only 0.0025 than the result of the team that won the 4th place using entirely non-neural methods. Our experiments indicate that char ngram features are more helpful than word ngram features. This phenomenon probably occurs because tweets are more characterized by characters than by words, tweets are short, and contain various special sequences of characters, e.g., hashtags, shortcuts, slang words, and typos.

机译：在本文中，我们将我们的意见书描述为Semeval-2020比赛。我们解决了Subtask 12 - “社交媒体的多语言攻击语言识别”。我们开发了四种语言的不同型号：阿拉伯语，丹麦语，希腊语和土耳其语。我们使用各种字符和单词N-GRAM功能组合应用三种监督机器学习方法。此外，我们应用了基本预处理方法的各种组合。我们最好的提交是我们在使用随机森林的丹麦语中为冒犯语言识别而建立的模型。该模型在39份提交中排名第6位。我们的结果仅低0.0025，而不是使用完全非神经方法赢得第四位的团队的结果。我们的实验表明，Char Ngram功能比单词Ngram功能更有助于。这种现象可能出现，因为Tweets的特征是字符而不是单词，推文是短的，并且包含各种特殊的字符序列，例如，HASHTAG，快捷方式，俚语和拼写字符。

著录项

来源
《International Workshop on Semantic Evaluation》|2020年|2017-2022|共6页
会议地点
作者
Moshe Uzan; Yaakov HaCohen-Kerner;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Enhancing Contextualised Language Models with Static Character and Word Embeddings for Emotional Intensity and Sentiment Strength Detection in Arabic Tweets [J] . Abdullah I. Alharbi, Phillip Smith, Mark Lee Procedia Computer Science . 2021,第a期

机译：增强具有静态字符和Word Embeddings的语境化语言模型，用于阿拉伯语推文中的情绪强度和情绪强度检测
2. Spelling Checker of Words in Rejang Language Using the N-Gram and Euclidean Distance Methods [J] . Wibowo Sastya Hendri, Soerowirdjo Busono, Ernastuti, Journal of computational and theoretical nanoscience . 2019,第12期

机译：使用n克和欧几里德距离方法拼写Rejang语言中的单词拼写检查
3. On the feasibility of character n-grams pseudo-translation for Cross-Language Information Retrieval tasks [J] . Jesus Vilares, Manuel Vilares, Miguel A. Alonso, Computer speech and language . 2016,第Mara期

机译：关于字符n元语法伪翻译在跨语言信息检索任务中的可行性
4. JCTICOL at SemEval-2019 Task 6: Classifying Offensive Language in Social Media using Deep Learning Methods, Word/Character N-gram Features, and Preprocessing Methods [C] . Yaakov HaCohen-Kerner, Ziv Ben-David, Gal Didi, Annual conference of the North American Chapter of the Association for Computational Linguistics: human language technologies;International workshop on semantic evaluation . 2019

机译：JCTICOL在SemEval-2019上的任务6：使用深度学习方法，单词/字符N-gram功能和预处理方法在社交媒体中对攻击性语言进行分类
5. Effect of large character size on productivity of the elderly in a character detection task [D] . Lievano, Jose Gabriel 1991

机译：字符检测任务中大字符尺寸对老年人生产力的影响
6. Unsupervised acquisition of idiomatic units of symbolic natural language: An n-gram frequency-based approach for the chunking of news articles and tweets [O] . Dario Borrelli, Gabriela Gongora Svartzman, Carlo Lipizzi 2020

机译：无监督的象征自然语言惯用单位的收购：新闻文章和推文的分组的基于n克频率的方法
7. JCTICOL at SemEval-2019 Task 6: Classifying Offensive Language in Social Media using Deep Learning Methods, Word/Character N-gram Features, and Preprocessing Methods [O] . Yaakov HaCohen-Kerner, Ziv Ben-David, Gal Didi, 2019

机译：JcTicol在Semeval-2019任务6：使用深度学习方法，单词/字符n-gram功能和预处理方法对社交媒体进行攻击语言

JCT at SemEval-2020 Task 12: Offensive Language Detection in Tweets using Preprocessing Methods, Character and Word N-grams

摘要

著录项

相似文献

相关主题

期刊订阅