首页> 外文期刊>Computers & Security >Differential area analysis for ransomware attack detection within mixed file datasets
【24h】

Differential area analysis for ransomware attack detection within mixed file datasets

机译:混合文件数据集中赎金软件攻击检测的差分区域分析

获取原文
获取原文并翻译 | 示例
           

摘要

The threat from ransomware continues to grow both in the number of affected victims as well as the cost incurred by the people and organisations impacted in a successful attack. In the majority of cases, once a victim has been attacked there remain only two courses of action open to them; either pay the ransom or lose their data. One common behaviour shared between all crypto ransomware strains is that at some point during their execution they will attempt to encrypt the users' files. This paper demonstrates a technique that can identify when these encrypted files are being generated and is independent of the strain of the ransomware. An enhanced mixed file ransomware data set of more than 130,000 files was developed based on the govdocsl(Garfinkel, 2020) corpus. This data set was enriched to contain examples of files that reflect the more modern Microsoft file formats, as well as examples of high entropy file formats such as compressed files and archives. The data set also contained eight different sets of files that were generated as the result of different real-world high profile ransomware attacks such as WannaCry, Ryuk, Phobos, Sodinokibi and NetWalker. Previous research Penrose et al. (2013); Zhao et al. (2011) has highlighted the difficulty in differentiating between compressed and encrypted files using Shannon entropy as both file types exhibit similar values. One of the experiments described in this paper shows a unique characteristic for the Shannon entropy of encrypted file header fragments. This characteristic was used to differentiate between encrypted files and other high entropy files such as archives. This discovery was leveraged in the development of a file classification model that used the differential area between the entropy curve of a file under analysis and one generated from random data. When comparing the entropy plot values of a file under analysis against one generated by a file containing purely random numbers, the greater the correlation of the plots is, the higher the confidence that the file under analysis contains encrypted data. The experiments demonstrate a high degree of confidence in the accuracy of the model achieving a success rate of more than 99.96% when examining only the first 192 bytes of a file, using a mixed data set of more than 80,000 files. This technique successfully addresses the problem of using file entropy to differentiate compressed and archived files from files encrypted by ransomware in a timely manner.
机译:赎金软件的威胁继续在受影响的受害者的数量中增长,以及人民和组织在成功袭击中受到影响的成本。在大多数情况下,一旦受害者受到攻击,仍然只向他们开放了两个行动课程;支付赎金或丢失数据。所有Crypto赎金软件支持之间共享的一个常见行为是在执行期间的某些时候,他们将尝试加密用户的文件。本文演示了一种可以识别当生成这些加密文件并且独立于勒索软件的应变时识别的技术。基于GovDocsl(Garfinkel,2020)语料库,开发了超过130,000个文件的增强型混合文件Ransomware数据集。此数据集已丰富地包含反映更现代的Microsoft文件格式的文件示例,以及诸如压缩文件和存档之类的高熵文件格式的示例。数据集还包含八个不同的文件,这些文件是由于不同的真实高调赎金软件攻击而生成的文件,例如Wannacry,Ryuk,Phobos,SodInibi和Netwalker。以前的研究Penrose等。 (2013);赵等人。 (2011)突出了使用Shannon熵在压缩和加密文件之间区分的困难,因为这两个文件类型都表现出类似的值。本文中描述的一个实验显示了加密文件头片段的Shannon熵的独特特征。这种特性用于区分加密文件和诸如档案的其他高熵文件。该发现被利用在开发文件分类模型中,该模型在分析中使用文件的熵曲线之间使用差异区域,并且从随机数据生成的文件分类。当在分析中与包含纯随机数生成的文件的文件的熵绘图值进行比较时,绘图的相关性越大,分析中文件的信心越高,包含加密数据。实验表明,在使用超过80,000个文件的混合数据集时,在仅检查文件的第一个192字节时,模型的准确性展示了高度的置信度。此技术成功地解决了使用文件熵的问题,以及时由勒索软件加密的文件区分压缩和存档文件。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号