...
首页> 外文期刊>Expert systems with applications >Hybrid ensemble approaches to online harassment detection in highly imbalanced data
【24h】

Hybrid ensemble approaches to online harassment detection in highly imbalanced data

机译:混合集合在高度不平衡数据中的在线骚扰检测方法

获取原文
获取原文并翻译 | 示例

摘要

Online harassment is a major threat to users of social media platforms, especially young adults and women. It can cause mental illnesses and impacts deeply and negatively economic institutions experiencing cyberbully attacks by losing their credibility and business. This makes automatic detection of online harassment extremely important. Most of current studies within this context apply machine-learning algorithms that assume balanced class distribution. However, this assumption does not hold for most real datasets. This research provides a comprehensive investigation of various approaches that combine diverse techniques under three dimensions: feature representation, imbalanced data handling, and supervised learning. For the first dimension, three wordembedding models have been considered, namely: word2vec, Glove, and SSWE. For the other two dimensions, nine techniques for balancing skewed class distributions have been employed to feed several learning models. In particular, resampling methods, cost-sensitive learning, and Weight-Selection strategy-based methods have been used with deep neural networks. The ultimate goal of this study is to evaluate the potential of using such hybrid approaches to handle the online harassment detection task efficiently using highly-imbalanced Twitter data and to select the best combination concerning the intended purpose. An extensive comparative study has been conducted, and the results have been discussed in terms of three evaluation metrics widely used for imbalanced classification. As main findings, Glove has been found as the best feature representation and some combinations as the best performing most notably LSTM and BLSTM with cost-sensitive learning and VL strategy.
机译:在线骚扰是对社交媒体平台的主要威胁,特别是年轻人和女性。通过失去信誉和业务,它可以引起精神疾病,并对经济的经济侵袭产生威胁。这使自动检测在线骚扰非常重要。在此上下文中的大多数研究都适用于承担平衡类分布的机器学习算法。但是,此假设不适用于大多数实际数据集。本研究提供了对三个维度下组合各种技术的各种方法的全面调查:特征表示,不平衡数据处理和监督学习。对于第一个维度,已经考虑了三种类型的型号,即:Word2Vec,手套和Sswe。对于另外两个维度,已经采用了用于平衡偏斜类分布的九种技术来馈送多种学习模型。特别地,已经与深神经网络一起使用重采样方法,成本敏感的学习和基于体重选择策略的方法。本研究的最终目标是评估使用此类混合方法以有效地使用高度不平衡的Twitter数据来处理在线骚扰检测任务,并选择关于预期目的的最佳组合。已经进行了广泛的比较研究,结果已经讨论了三项评估指标,广泛用于不平衡分类。作为主要结果,手套被发现是最好的特征表示和一些组合,作为具有成本敏感的学习和VL策略的最佳表现最大的LSTM和BLSTM。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号