Hybrid ensemble approaches to online harassment detection in highly imbalanced data

Tolba Marwa; Ouadfel Salima; Meshoul Souham

首页> 外文期刊>Expert systems with applications >Hybrid ensemble approaches to online harassment detection in highly imbalanced data

【24h】

Hybrid ensemble approaches to online harassment detection in highly imbalanced data

机译：混合集合在高度不平衡数据中的在线骚扰检测方法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Online harassment is a major threat to users of social media platforms, especially young adults and women. It can cause mental illnesses and impacts deeply and negatively economic institutions experiencing cyberbully attacks by losing their credibility and business. This makes automatic detection of online harassment extremely important. Most of current studies within this context apply machine-learning algorithms that assume balanced class distribution. However, this assumption does not hold for most real datasets. This research provides a comprehensive investigation of various approaches that combine diverse techniques under three dimensions: feature representation, imbalanced data handling, and supervised learning. For the first dimension, three wordembedding models have been considered, namely: word2vec, Glove, and SSWE. For the other two dimensions, nine techniques for balancing skewed class distributions have been employed to feed several learning models. In particular, resampling methods, cost-sensitive learning, and Weight-Selection strategy-based methods have been used with deep neural networks. The ultimate goal of this study is to evaluate the potential of using such hybrid approaches to handle the online harassment detection task efficiently using highly-imbalanced Twitter data and to select the best combination concerning the intended purpose. An extensive comparative study has been conducted, and the results have been discussed in terms of three evaluation metrics widely used for imbalanced classification. As main findings, Glove has been found as the best feature representation and some combinations as the best performing most notably LSTM and BLSTM with cost-sensitive learning and VL strategy.

机译：在线骚扰是对社交媒体平台的主要威胁，特别是年轻人和女性。通过失去信誉和业务，它可以引起精神疾病，并对经济的经济侵袭产生威胁。这使自动检测在线骚扰非常重要。在此上下文中的大多数研究都适用于承担平衡类分布的机器学习算法。但是，此假设不适用于大多数实际数据集。本研究提供了对三个维度下组合各种技术的各种方法的全面调查：特征表示，不平衡数据处理和监督学习。对于第一个维度，已经考虑了三种类型的型号，即：Word2Vec，手套和Sswe。对于另外两个维度，已经采用了用于平衡偏斜类分布的九种技术来馈送多种学习模型。特别地，已经与深神经网络一起使用重采样方法，成本敏感的学习和基于体重选择策略的方法。本研究的最终目标是评估使用此类混合方法以有效地使用高度不平衡的Twitter数据来处理在线骚扰检测任务，并选择关于预期目的的最佳组合。已经进行了广泛的比较研究，结果已经讨论了三项评估指标，广泛用于不平衡分类。作为主要结果，手套被发现是最好的特征表示和一些组合，作为具有成本敏感的学习和VL策略的最佳表现最大的LSTM和BLSTM。

著录项

来源
《Expert systems with applications》 |2021年第8期|114751.1-114751.13|共13页
作者
Tolba Marwa; Ouadfel Salima; Meshoul Souham;
展开▼
作者单位

Univ Abdelhamid MEHRI Constantine 2 Fac Informat & Commun Technol Dept Comp Sci & Its Applicat Lab Modelling & Implementat Complex Syst MISC Lab El Khroub Algeria;

Univ Abdelhamid MEHRI Constantine 2 Fac Informat & Commun Technol Dept Comp Sci & Its Applicat El Khroub Algeria;

Princess Nourah Bint Abdulrahman Univ Coll Comp & Informat Sci Riyadh Saudi Arabia;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Online harassment detection; Imbalanced learning; Word embedding; Deep learning; Imbalanced Twitter data;

机译：在线骚扰检测;学习不平衡;词嵌入;深入学习;不平衡推特数据;

相似文献

外文文献
中文文献
专利

1. Online breakage detection of multitooth tools using classifier ensembles for imbalanced data [J] . Andres Bustillo, Juan J. Rodriguez International journal of systems science . 2014,第10a12期

机译：使用分类器集成对不平衡数据进行多齿工具的在线破损检测
2. RNN-Based online anomaly detection in nuclear reactors for highly imbalanced datasets with uncertainty [J] . Kim Minhee, Ou Elisa, Loh Po-Ling, Nuclear Engineering and Design . 2020,第Auga期

机译：基于RNN的在线异常检测核反应堆，用于具有不确定性的高度不平衡数据集
3. An online ensembles approach for handling concept drift in data streams: diversified online ensembles detection [J] . Parneeta Sidhu, M. P. S. Bhatia International journal of machine learning and cybernetics . 2015,第6期

机译：在线合奏方法来处理数据流中的概念漂移：多样化的在线合奏检测
4. ECG Heartbeat Classification Using Ensemble of Efficient Machine Learning Approaches on Imbalanced Datasets [C] . Md. Atik Ahamed, Kazi Amit Hasan, Khan Fashee Monowar, International Conference on Advanced Information and Communication Technology . 2020

机译：ECG心跳分类，使用高效的机器学习方法上的基础数据集
5. Diversified ensemble classifiers for highly imbalanced data learning and its application in bioinformatics. [D] . Ding, Zejin. 2011

机译：用于高度不平衡数据学习的多元化集成分类器及其在生物信息学中的应用。
6. An empirical study of ensemble-based semi-supervised learning approaches for imbalanced splice site datasets [O] . Ana Stanescu, Doina Caragea 2015

机译：基于整体的不平衡拼接位点数据集半监督学习方法的实证研究
7. Android Ransomware Detection Based on a Hybrid Evolutionary Approach in the Context of Highly Imbalanced Data [O] . Iman Almomani, Raneem Qaddoura, Maria Habib, 2021

机译：Android Ransomware检测基于混合进化方法在高度不平衡数据的背景下

Hybrid ensemble approaches to online harassment detection in highly imbalanced data

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅