首页> 外文会议>Resilience Week >Large-scale exploration of feature sets and deep learning models to classify malicious applications
【24h】

Large-scale exploration of feature sets and deep learning models to classify malicious applications

机译:对特征集的大规模探索和深入学习模型来分类恶意应用程序

获取原文

摘要

In recent years, researchers have shown that deep learning (DL) can be used to construct highly accurate models to solve many problems. However, training DL models requires large datasets and vast amounts of computation. With millions of malware variants being created every day, we contend that there is plenty of data to build deep learning models to classify malicious applications. However, finding the best DL model for this task requires exploring a wide range of methods to characterize malware and a variety of different DL models that can be used to classify malicious applications. To the best of our knowledge, no work has been presented that explores the large malware characterization space together with the variety of different DL models that could be brought to bear to this problem. In this paper, we present our work on exploring a large set of features and different DL models for the problem of malware family classification. To make this possible, we built a scalable machine-learning platform on Amazon Web Services (AWS). This platform made it possible to train many DL models concurrently on thousands of machines while collecting accurate data and performance information at regular checkpoints for reliability. We used this platform to evaluate two hundred DL models for eleven different malware characterizations. These characterizations include seven novel graph-based characterizations of the structure of executable code. While state-of-the-art malware characterizations yield 13.8% error-rate, our novel graph-based characterizations make less than 6.3% errors.
机译:近年来,研究人员表明,深度学习(DL)可用于构建高度准确的模型来解决许多问题。但是,培训DL模型需要大型数据集和大量计算。每天都会创建数百万恶意软件变体,我们认为有足够的数据来构建深度学习模型来分类恶意应用程序。但是,找到此任务的最佳DL模型需要探索各种方法来表征恶意软件和各种可用于对恶意应用程序进行分类的不同DL模型。据我们所知,没有提出任何工作,探讨了大型恶意软件表征空间以及可以带到此问题的不同DL模型。在本文中,我们展示了我们探索大量功能和不同DL模型的工作,以解决恶意软件家庭分类问题。为了实现这一目标,我们在亚马逊Web服务(AWS)上建立了一个可扩展的机器学习平台。该平台使得可以在数千台机器上培训许多DL模型,同时在定期检查点以可靠性收集准确的数据和性能信息。我们使用这个平台来评估二百个DL模型,以实现11个不同的恶意软件。这些特征包括可执行代码结构的七种基于图形的图表。虽然最先进的恶意软件特性收益率为13.8 %错误率,我们的新颖的基于图形的特性少于6.3 %的错误。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号