首页> 外文期刊>Digital investigation >A Machine Learning-based Triage methodology for automated categorization of digital media
【24h】

A Machine Learning-based Triage methodology for automated categorization of digital media

机译:一种基于机器学习的分类方法,用于数字媒体的自动分类

获取原文
获取原文并翻译 | 示例
           

摘要

The global diffusion of smartphones and tablets, exceeding traditional desktops and laptops market share, presents investigative opportunities and poses serious challenges to law enforcement agencies and forensic professionals. Traditional Digital Forensics techniques, indeed, may be no longer appropriate for timely analysis of digital devices found at the crime scene. Nevertheless, dealing with specific crimes such as murder, child abductions, missing persons, death threats, such activity may be crucial to speed up investigations. Motivated by this, the paper explores the field of Triage, a relatively new branch of Digital Forensics intended to provide investigators with actionable intelligence through digital media inspection, and describes a new interdisciplinary approach that merges Digital Forensics techniques and Machine Learning principles. The proposed Triage methodology aims at automating the categorization of digital media on the basis of plausible connections between traces retrieved (i.e. digital evidence) and crimes under investigation. As an application of the proposed method, two case studies about copyright infringement and child pornography exchange are then presented to actually prove that the idea is viable. The term "feature" will be regarded in the paper as a quantitative measure of a "plausible digital evidence", according to the Machine Learning terminology. In this regard, we (a) define a list of crime-related features, (b) identify and extract them from available devices and forensic copies, (c) populate an input matrix and (d) process it with different Machine Learning mining schemes to come up with a device classification. We perform a benchmark study about the most popular mining algorithms (i.e. Bayes Networks, Decision Trees, Locally Weighted Learning and Support Vector Machines) to find the ones that best fit the case in question. Obtained results are encouraging as we will show that, triaging a dataset of 13 digital media and 45 copyright infringement-related features, it is possible to obtain more than 93% of correctly classified digital media using Bayes Networks or Support Vector Machines while, concerning child pornography exchange, with a dataset of 23 cell phones and 23 crime-related features it is possible to classify correctly 100% of the phones. In this regards, methods to reduce the number of linearly independent features are explored and classification results presented.
机译:智能手机和平板电脑在全球范围内的传播超过了传统台式机和笔记本电脑的市场份额,这为调查提供了机会,并对执法机构和法医专业人员构成了严峻的挑战。实际上,传统的数字取证技术可能不再适合于及时分析犯罪现场发现的数字设备。然而,处理诸如谋杀,绑架儿童,失踪人员,死亡威胁等具体罪行,对于加快调查速度可能至关重要。因此,本文探索了Triage的领域,Triage是一个相对较新的Digital Forensics分支,旨在通过数字媒体检查为调查人员提供可操作的情报,并描述了一种融合了Digital Forensics技术和机器学习原理的跨学科方法。拟议的分流方法旨在根据检索到的痕迹(即数字证据)与正在调查的犯罪之间的合理联系使数字媒体的分类自动化。作为所提方法的一种应用,然后提出了两个关于版权侵权和儿童色情交流的案例研究,以实际证明该想法是可行的。根据机器学习术语,“功能”一词在本文中将被视为对“合理的数字证据”的定量度量。在这方面,我们(a)定义了与犯罪有关的功能列表,(b)从可用的设备和法证副本中识别并提取它们,(c)填充输入矩阵,并(d)使用不同的机器学习挖掘方案对其进行处理提出设备分类。我们对最流行的挖掘算法(即贝叶斯网络,决策树,局部加权学习和支持向量机)进行了基准研究,以找到最适合问题案例的算法。我们将获得的结果令人鼓舞,因为我们将对13种数字媒体和45种与版权侵权相关的功能进行分类,可以使用贝叶斯网络或支持向量机获得93%以上正确分类的数字媒体,而涉及儿童色情内容交换,通过23个手机和23个与犯罪相关的功能的数据集,可以正确地对100%的手机进行分类。在这方面,探索了减少线性独立特征数量的方法,并给出了分类结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号