...
首页> 外文期刊>Natural language engineering >Detecting sexual predators in chats using behavioral features and imbalanced learning
【24h】

Detecting sexual predators in chats using behavioral features and imbalanced learning

机译:使用行为特征和不平衡学习来检测聊天中的性掠食者

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

This paper presents a system developed for detecting sexual predators in online chat conversations using a two-stage classification and behavioral features. A sexual predator is defined as a person who tries to obtain sexual favors in a predatory manner, usually with underage people. The proposed approach uses several text categorization methods and empirical behavioral features developed especially for the task at hand. After investigating various approaches for solving the sexual predator identification problem, we have found that a two-stage classifier achieves the best results. In the first stage, we employ a Support Vector Machine classifier to distinguish conversations having suspicious content from safe online discussions. This is useful as most chat conversations in real life do not contain a sexual predator, therefore it can be viewed as a filtering phase that enables the actual detection of predators to be done only for suspicious chats that contain a sexual predator with a very high degree. In the second stage, we detect which of the users in a suspicious discussion is an actual predator using a Random Forest classifier. The system was tested on the corpus provided by the PAN 2012 workshop organizers and the results are encouraging because, as far as we know, our solution outperforms all previous approaches developed for solving this task.
机译:本文提出了一种使用两阶段分类和行为特征来检测在线聊天对话中的性掠食者的系统。性掠夺者的定义是企图以掠夺性方式获得性青睐的人,通常与未成年者在一起。所提出的方法使用了几种文本分类方法和专门针对手头任务开发的经验行为特征。在研究了解决性掠食者识别问题的各种方法之后,我们发现两阶段分类器可以达到最佳效果。在第一阶段,我们使用支持向量机分类器从安全的在线讨论中区分出具有可疑内容的对话。这很有用,因为现实生活中的大多数聊天对话都不含性掠食者,因此可以将其视为筛选阶段,从而仅对包含高度性掠夺者的可疑聊天进行侦查掠食者。在第二阶段,我们使用随机森林分类器检测可疑讨论中的哪些用户是实际的掠食者。该系统在PAN 2012研讨会组织者提供的语料库上进行了测试,结果令人鼓舞,因为据我们所知,我们的解决方案优于以前为解决此任务而开发的所有方法。

著录项

  • 来源
    《Natural language engineering》 |2017年第4期|589-616|共28页
  • 作者单位

    Department of Computer Science, University Politehnica of Bucharest, 060042 Bucharest, Romania;

    Department of Computer Science, University Politehnica of Bucharest, 060042 Bucharest, Romania;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号