首页> 外文期刊>Digital investigation >Machine learning based approach to analyze file meta data for smart phone file triage
【24h】

Machine learning based approach to analyze file meta data for smart phone file triage

机译:基于机器学习的方法分析智能手机文件分类的文件元数据

获取原文
获取原文并翻译 | 示例
           

摘要

With the rapid increase in mobile phone storage capacity and penetration, digital forensic investigators face a significant challenge in quickly identifying relevant examinable files within a plethora of uninteresting OS and application files extracted by forensic tools. This challenge can have serious adverse effects in time critical cases, and can also result in increasing case backlog. A possible solution for this issue is to prioritize digital artifacts. This is referred to as triage. Several digital forensic triage methodologies based on classical automation techniques such as block hash and regular expression matching have been proposed. However, such techniques suffer from the significant limitation of requiring users to know and hardcode data templates and relations of interest. In literature, more flexible machine learning based approaches have been proposed to classify whether a mobile device, rather than a mobile device artifact, is of interest or not based on its usage metrics and file-system metadata. Also, recently an approach has been proposed and tested in triaging data generated and extracted from a computer-based operating system. However, this approach did not cover smart mobile operating system, and it did not consider key steps such as feature engineering, feature selection, and hyper-parameter tuning. Hence, in this paper, we propose a comprehensive machine learning based solutions with features extracted from file metadata to identify possible smart phone files of interest that should be examined. A range of classification algorithms are tested and their performance compared. Our classification models were trained and tested on a dataset consisting of the metadata of nearly 2 million files extracted from devices running Android OS and linked to real terrorism cases. The use of real case data allows obtaining realistic results, and restricting the operating system and case type helps narrow the experimentation scope enough to provide a proof of concept. Through our experiments, a best classifier is also identified. (C) 2021 The Authors. Published by Elsevier Ltd.
机译:随着移动电话存储容量和渗透率的快速增加,数字法医调查人员在快速识别出在法医工具提取的一个无趣的操作系统和应用程序文件中快速识别相关的考试文件时面临重大挑战。这一挑战可能在时间批评案件中具有严重的不利影响,并且还可以导致案例积压。此问题的可能解决方案是优先考虑数字工件。这被称为分类。已经提出了几种基于块散列和常规表达式匹配的经典自动化技术的数份数字法医分类方法。然而,这种技术遭受了要求用户了解和硬代码数据模板和感兴趣关系的显着限制。在文献中,已经提出了基于更灵活的基于机器学习的方法来分类移动设备是否是感兴趣的,而不是移动设备伪像,而不是基于其使用度量和文件系统元数据。此外,最近已经提出了一种方法,并在从基于计算机的操作系统中产生和提取的三环数据中进行了测试。但是,这种方法没有涵盖智能移动操作系统,并且没有考虑关键步骤,例如特征工程,特征选择和超参数调整。因此,在本文中,我们提出了一系列基于机器学习的解决方案,其中包含从文件元数据中提取的功能,以识别应该检查的可能感兴趣的智能手机文件。测试了一系列分类算法及其性能。我们的分类模型培训并在数据集上进行了测试,该数据集由从运行Android OS的设备中提取的近200万个文件的元数据组成,并链接到真正的恐怖主义案例。使用实际情况数据允许获得现实的结果,并限制操作系统和案例类型有助于缩小实验范围,以提供概念证明。通过我们的实验,还确定了最好的分类器。 (c)2021作者。 elsevier有限公司出版

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号