首页> 外文会议>UK Workshop on Computational Intelligence >Harnessing the Power of Text Mining for the Detection of Abusive Content in Social Media
【24h】

Harnessing the Power of Text Mining for the Detection of Abusive Content in Social Media

机译:利用文本挖掘的力量,以检测社交媒体中的滥用内容

获取原文

摘要

The issues of cyberbullying and online harassment have gained considerable coverage in the last number of years. Social media providers need to be able to detect abusive content both accurately and efficiently in order to protect their users. Our aim is to investigate the application of core text mining techniques for the automatic detection of abusive content across a range of social media sources include blogs, forums, media-sharing, Q&A and chat-using datasets from Twitter, YouTube, MySpace, Kongregate, Formspring and Slashdot. Using supervised machine learning, we compare alternative text representations and dimension reduction approaches, including feature selection and feature enhancement, demonstrating the impact of these techniques on detection accuracies. In addition, we investigate the need for sampling on imbalanced datasets. Our conclusions are: (1) Dataset balancing boosts accuracies significantly for social media abusive content detection; (2) Feature reduction, important for large feature sets that are typical of social media datasets, improves efficiency whilst maintaining detection accuracies; (3) The use of generic structural features common across all our datasets proved to be of limited use in the automatic detection of abusive content. Our findings can support practitioners in selecting appropriate text mining strategies in this area.
机译:在过去的几年里,网络欺凌和在线骚扰的问题在较上数年内获得了相当大的覆盖范围。社交媒体提供商需要能够准确且有效地检测滥用内容,以保护其用户。我们的目标是调查核心文本挖掘技术的应用,用于在一系列社交媒体来源中自动检测滥用内容的应用包括博客,论坛,媒体共享,问答和聊天数据集来自Twitter,YouTube,MySpace,Kongregate, formspring和slashdot。使用监督机器学习,我们比较替代文本表示和尺寸减少方法,包括特征选择和功能增强,展示了这些技术对检测准确性的影响。此外,我们还研究了对不平衡数据集进行采样的需求。我们的结论是:(1)数据集平衡对社交媒体滥用内容检测进行了显着提升精度; (2)特征减少,对于典型的社交媒体数据集的大型特征集重要,虽然保持检测精度,但效率提高; (3)在所有数据集中使用的通用结构特征在自动检测滥用内容中被证明是有限的使用。我们的调查结果可以支持从业人员在该领域选择适当的文本挖掘策略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号