首页> 外文期刊>Digital investigation >Laying foundations for effective machine learning in law enforcement. Majura - A labelling schema for child exploitation materials
【24h】

Laying foundations for effective machine learning in law enforcement. Majura - A labelling schema for child exploitation materials

机译:为执法中有效的机器学习奠定基础。 Majura-儿童剥削材料的标签架构

获取原文
获取原文并翻译 | 示例
           

摘要

The health impacts of repeated exposure to distressing concepts such as child exploitation materials (CEM, aka 'child pornography') have become a major concern to law enforcement agencies and associated entities. Existing methods for 'flagging' materials largely rely upon prior knowledge, whilst predictive methods are unreliable, particularly when compared with equivalent tools used for detecting 'lawful' pornography. In this paper we detail the design and implementation of a deep-learning based CEM classifier, leveraging existing pornography detection methods to overcome infrastructure and corpora limitations in this field. Specifically, we further existing research through direct access to numerous contemporary, real-world, annotated cases taken from Australian Federal Police holdings, demonstrating the dangers of overfitting due to the influence of individual users' proclivities. We quantify the performance of skin tone analysis in CEM cases, showing it to be of limited use. We assess the performance of our classifier and show it to be sufficient for use in forensic triage and 'early warning' of CEM, but of limited efficacy for categorising against existing scales for measuring child abuse severity.We identify limitations currently faced by researchers and practitioners in this field, whose restricted access to training material is exacerbated by inconsistent and unsuitable annotation schemas. Whilst adequate for their intended use, we show existing schemas to be unsuitable for training machine learning (ML) models, and introduce a new, flexible, objective, and tested annotation schema specifically designed for cross-jurisdictional collaborative use.This work, combined with a world-first 'illicit data airlock' project currently under construction, has the potential to bring a 'ground truth' dataset and processing facilities to researchers worldwide without compromising quality, safety, ethics and legality. (C) 2018 Elsevier Ltd. All rights reserved.
机译:反复接触令人痛苦的概念(如儿童剥削材料(CEM,又名“儿童色情制品”))对健康的影响已成为执法机构和相关实体的主要关注点。现有的“举报”材料的方法主要依赖于先验知识,而预测方法却不可靠,尤其是与用于检测“合法”色情内容的等效工具相比时。在本文中,我们详细介绍了基于深度学习的CEM分类器的设计和实现,利用现有的色情内容检测方法来克服该领域中的基础结构和语料库限制。具体来说,我们通过直接访问来自澳大利亚联邦警察持有的众多当代,真实且带有注释的案例,进一步开展了现有研究,证明了由于个人用户的喜好而造成的过度拟合的危险。我们对CEM案例中的肤色分析性能进行量化,表明其用途有限。我们评估了分类器的性能,并表明该分类器足以用于法医分类和CEM的``早期预警'',但根据现有的衡量虐待儿童严重程度的量表进行分类的功效有限。我们确定了研究人员和从业人员当前面临的局限性在该领域中,不一致和不合适的注释模式加剧了其对培训材料的访问受限。尽管适合其预期用途,但我们显示了不适合用于训练机器学习(ML)模型的现有模式,并引入了专门为跨辖区协作使用而设计的新的,灵活的,客观的和经过测试的注释模式。目前正在建设中的世界上第一个“非法数据气闸”项目,有可能在不影响质量,安全性,道德和合法性的前提下,为全球研究人员带来“地面事实”数据集和处理设施。 (C)2018 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号