首页> 外文期刊>Future generation computer systems >Classification of ransomware families with machine learning based on N-gram of opcodes
【24h】

Classification of ransomware families with machine learning based on N-gram of opcodes

机译:基于N-gram操作码的机器学习勒索软件系列分类

获取原文
获取原文并翻译 | 示例
       

摘要

Ransomware is a special type of malware that can lock victims' screen and/or encrypt their files to obtain ransoms, resulting in great damage to users. Mapping ransomware into families is useful for identifying the variants of a known ransomware sample and for reducing analysts' workload. However, ransomware that can fingerprint the environment can evade the precious work of dynamic analysis. To the best of our knowledge, to overcome this shortcoming, we are the first to propose an approach based on static analysis to classifying ransomware. First, opcode sequences from ransomware samples are transformed into N-gram sequences. Then, Term frequency-Inverse document frequency (TF-IDF) is calculated for each N-gram to select feature N-grams so that these N-grams exhibit better discrimination between families. Finally, we treat the vectors composed of the TF values of the feature N-grams as the feature vectors and subsequently feed them to five machine-learning methods to perform ransomware classification. Six evaluation criteria are employed to validate the model. Thorough experiments performed using real datasets demonstrate that our approach can achieve the best Accuracy of 91.43%. Furthermore, the average F1-measure of the "wannacry" ransomware family is up to 99%, and the Accuracy of binary classification is up to 99.3%. The proposed method can detect and classify ransomware that can fingerprint the environment. In addition, we discover that different feature dimensions are required for achieving similar classifier performance with feature N-grams of diverse lengths. (C) 2018 Elsevier B.V. All rights reserved.
机译:勒索软件是一种特殊的恶意软件,可以锁定受害者的屏幕和/或加密他们的文件以获得勒索,从而对用户造成巨大损害。将勒索软件映射到系列对于识别已知勒索软件样本的变体并减少分析师的工作量很有用。但是,可以识别环境的勒索软件可以规避动态分析的宝贵工作。据我们所知,为了克服这一缺点,我们率先提出了一种基于静态分析的方法来对勒索软件进行分类。首先,将勒索软件样本中的操作码序列转换为N-gram序列。然后,为每个N-gram计算词频-逆文档频率(TF-IDF)以选择特征N-gram,以便这些N-gram更好地区分族。最后,我们将由特征N-gram的TF值组成的向量作为特征向量,然后将其馈入五种机器学习方法以进行勒索软件分类。采用六个评估标准来验证模型。使用实际数据集进行的全面实验表明,我们的方法可以达到91.43%的最佳准确度。此外,“ wannacry”勒索软件系列的平均F1措施高达99%,二进制分类的准确性高达99.3%。所提出的方法可以检测和分类可以对环境进行指纹识别的勒索软件。此外,我们发现使用不同长度的特征N-gram实现相似的分类器性能需要不同的特征尺寸。 (C)2018 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号