首页> 外文期刊>Computers & Security >A cost analysis of machine learning using dynamic runtime opcodes for malware detection
【24h】

A cost analysis of machine learning using dynamic runtime opcodes for malware detection

机译:使用动态运行时OPCODES进行恶意软件检测的机器学习成本分析

获取原文
获取原文并翻译 | 示例

摘要

The ongoing battle between malware distributors and those seeking to prevent the onslaught of malicious code has, so far, favored the former. Anti-virus methods are faltering with the rapid evolution and distribution of new malware, with obfuscation and detection evasion techniques exacerbating the issue. Recent research has monitored low-level opcodes to detect malware. Such dynamic analysis reveals the code at runtime, allowing the true behaviour to be examined. While previous research uses machine learning techniques to accurately detect malware using dynamic runtime opcodes, underpinning datasets have been poorly sampled and inadequate in size. Further, the datasets are always fixed size and no attempt, to our knowledge, has been made to examine the cost of retraining malware classification models on datasets which grow continually. In the literature, researchers discuss the explosion of malware, yet opcode analyses have used fixed-size datasets, with no deference to how this model will cope with retraining on escalating datasets. The research presented here examines this problem, and makes several novel contributions to the current body of knowledge.First, the performance of 23 machine learning algorithms are investigated with respect to the largest run trace dataset in the literature. Second, following an extensive hyperparameter selection process, the performance of each classifier is compared, on both accuracy and computational costs (CPU time). Lastly, the cost of retraining and testing updatable and non-updatable classifiers, both parallelized and non-parallelized, is examined with simulated escalating datasets. This provides insight into how implemented malware classifiers would perform, given simulated dataset escalation. We find that parallelized RandomForest, using 4 cores, provides the optimal performance, with high accuracy and low training and testing times. (C) 2019 Elsevier Ltd. All rights reserved.
机译:到目前为止,恶意软件经销商与寻求防止恶意代码的行为的人之间的持续战斗,偏爱前者。反病毒方法随着新恶意软件的快速演变和分布而致力于发抖,令人讨厌和检测逃号技术加剧了问题。最近的研究已监控低级操作码以检测恶意软件。此类动态分析显示运行时处的代码,允许检查真正的行为。虽然以前的研究使用机器学习技术来准确地使用动态运行时OPCODES准确地检测恶意软件,但基础数据集较差并尺寸不足。此外,已经确定了数据集始终是固定的大小,并且没有尝试以检查在不断增长的数据集上培训恶意软件分类模型的成本。在文献中,研究人员讨论了恶意软件的爆炸,但操作频率分析已经使用了固定大小的数据集,而没有尊重该模型将如何应对在升级数据集中的刷新。本研究介绍了这个问题,并对当前知识体系进行了几个新的贡献。首先,对文献中最大的运行跟踪数据集进行了调查了23层机器学习算法的性能。其次,遵循广泛的超参数选择过程,对每个分类器的性能进行比较,以精度和计算成本(CPU时间)。最后,使用模拟升级数据集检查并行并行化的再培养和测试可更新和不可更新的分类器的成本。考虑到模拟数据集升级,这提供了对如何实现恶意软件分类器执行的洞察。我们发现,使用4个核心并行化随机纲要,提供最佳性能,具有高精度和低培训和测试时间。 (c)2019 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号