首页> 外文会议>International Workshop on Semantic and Social Media Adaptation and Personalization >On Preprocessing the Data for Improving Sexual Predator Detection : Anonymous for review
【24h】

On Preprocessing the Data for Improving Sexual Predator Detection : Anonymous for review

机译:关于预处理数据以改善性掠食者检测:匿名供审查

获取原文

摘要

Sexual predator detection and predatory message identification are critical to avoid under-aged children from being abused online. In this paper, we investigate different feature extraction approaches for predatory detection. While the previous results indicate good accuracy on predatory conversation detection, there is a missing investigation on the robustness of feature space. Further, we also show the impact of preprocessing on data to improve the performance of predator identification and predatory message classification. Various types of the bag of words features, including binary, term frequency, and TF-IDF representation are investigated on the publicly available PAN 2012 competition dataset for predator identification. Further, to cover the relationship between the words in the text analysis, the GloVe feature set is also investigated for word embedding features. With the set of preprocessing of data, we illustrate the improvement in detecting predatory conversation with an accuracy of 0.994 and F1-score of 0.964.
机译:性掠食者的检测和掠夺性消息的识别对于避免未成年儿童在网上受到虐待至关重要。在本文中,我们研究了掠夺性检测的不同特征提取方法。尽管先前的结果表明掠夺性对话检测具有良好的准确性,但缺少对特征空间的鲁棒性的研究。此外,我们还展示了预处理对数据的影响,以提高掠夺者识别和掠夺性消息分类的性能。在可公开获取的PAN 2012竞争数据集中研究了各种类型的词袋特征,包括二进制,词频和TF-IDF表示形式,以识别天敌。此外,为了覆盖文本分析中单词之间的关系,还研究了GloVe功能集的单词嵌入功能。通过数据的预处理,我们说明了以0.994和F的精度检测掠夺性会话时的改进 1 -得分为0.964。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号