首页> 外文会议>International Conference on Security and Cryptography >Behavior-based Malware analysis using profile hidden Markov models
【24h】

Behavior-based Malware analysis using profile hidden Markov models

机译:使用配置文件隐藏Markov模型的基于行为的恶意软件分析

获取原文

摘要

In the area of malware analysis, static binary analysis techniques are becoming increasingly difficult with the code obfuscation methods and code packing employed when writing the malware. The behavior-based analysis techniques are being used in large malware analysis systems because of this reason. In these dynamic analysis systems, the malware samples are executed and monitored in a controlled environment using tools such as CWSandbox(Willems et al., 2007). In previous works, a number of clustering and classification techniques from machine learning and data mining have been used to classify the malwares into families and to identify even new malware families, from the behavior reports. In our work, we propose to use the Profile Hidden Markov Model to classify the malware files into families or groups based on their behavior on the host system. PHMM has been used extensively in the area of bioinformatics to search for similar protein and DNA sequences in a large database. We see that using this particular model will help us overcome the hurdle posed by polymorphism that is common in malware today. We show that the classification accuracy is high and comparable with the state-of-art-methods, even when using very few training samples for building models. The experiments were on a dataset with 24 families initially, and later using a larger dataset with close to 400 different families of malware. A fast clustering method to group malware with similar behaviour following the scoring on the PHMM profile database was used for the large dataset. We have presented the challenges in the evaluation methods and metrics of clustering on large number of malware files and show the effectiveness of using profile hidden model models for known malware families.
机译:在恶意软件分析领域,随着编写恶意软件时采用的代码混淆方法和代码打包,静态二进制分析技术变得越来越困难。由于这个原因,基于行为的分析技术正在大型恶意软件分析系统中使用。在这些动态分析系统中,恶意软件样本是使用诸如CWSandbox之类的工具在受控环境中执行和监视的(Willems等,2007)。在以前的工作中,已经使用了许多来自机器学习和数据挖掘的聚类和分类技术,以将恶意软件分类为各种家族,甚至可以从行为报告中识别出新的恶意软件家族。在我们的工作中,我们建议使用配置文件隐马尔可夫模型根据恶意软件文件在主机系统上的行为将其分类为家族或组。 PHMM已在生物信息学领域广泛使用,以在大型数据库中搜索相似的蛋白质和DNA序列。我们看到使用这种特定模型将帮助我们克服当今恶意软件中常见的多态性带来的障碍。我们显示,即使使用很少的训练样本进行模型构建,分类的准确性也很高,并且可以与最新方法媲美。实验最初是在具有24个家族的数据集上进行的,后来又使用了具有近400个不同恶意软件家族的更大的数据集。对于大型数据集,使用了一种快速聚类方法,以按照PHMM配置文件数据库上的评分对具有类似行为的恶意软件进行分组。我们已经介绍了在大量恶意软件文件上进行聚类的评估方法和度量标准中的挑战,并展示了对已知恶意软件家族使用配置文件隐藏模型模型的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号