首页> 外文期刊>Concurrency, practice and experience >A fast malware feature selection approach using a hybrid of multi-linear and stepwise binary logistic regression
【24h】

A fast malware feature selection approach using a hybrid of multi-linear and stepwise binary logistic regression

机译:快速的恶意软件功能选择方法,使用了多线性和逐步二进制逻辑回归的混合

获取原文
获取原文并翻译 | 示例

摘要

Malware replicates itself and produces offspring with the same characteristics but different signatures by using code obfuscation techniques. Current generation anti-virus engines employ a signature-template type detection approach where malware can easily evade existing signatures in the database. This reduces the capability of current anti-virus engines in detecting malware. In this paper, we propose a stepwise binary logistic regression-based dimensionality reduction techniques for malware detection using application program interface (API) call statistics. Finding the most significant malware feature using traditional wrapper-based approaches takes an exponential complexity of the dimension (m) of the dataset with a bruteforce search strategies and order of (m-1) complexity with a backward elimination filter heuristics. The novelty of the proposed approach is that it finds the worst case computational complexity which is less than order of (m-1). The proposed approach uses multi-linear regression and the p- value of each individual API feature for selection of the most uncorrelated and significant features in order to reduce the dimensionality of the large malware data and to ensure the absence of multi-collinearity. The stepwise logistic regression approach is then employed to test the significance of the individual malware feature based on their corresponding Wald statistic and to construct the binary decision the model. When the selected most significant APIs are used in a decision rule generation systems, this approach not only reduces the tree size but also improves classification performance. Exhaustive experiments on a large malware data set show that the proposed approach clearly exceeds the existing standard decision rule, support vector machine-based template approach with complete data and provides a better statistical fitness.
机译:恶意软件通过使用代码混淆技术自我复制并产生具有相同特征但签名不同的后代。当前一代的防病毒引擎采用签名模板类型检测方法,在这种方法中,恶意软件可以轻松逃避数据库中现有的签名。这降低了当前防病毒引擎检测恶意软件的能力。在本文中,我们为使用应用程序接口(API)调用统计信息的恶意软件检测提出了一种基于逐步二进制逻辑回归的降维技术。使用传统的基于包装器的方法来查找最重要的恶意软件功能,需要使用蛮力搜索策略来处理数据集维度(m)的指数复杂性,并使用向后消除过滤器启发式方法来解决(m-1)复杂度的顺序。所提出的方法的新颖性在于,它发现最坏情况下的计算复杂度小于(m-1)的数量级。所提出的方法使用多线性回归和每个单独API功能的p值来选择最不相关和最重要的功能,以便减少大型恶意软件数据的维数并确保不存在多共线性。然后,采用逐步逻辑回归方法来基于各个恶意软件特征的相应Wald统计量测试各个恶意软件特征的重要性,并构建模型的二元决策。当在决策规则生成系统中使用所选的最高有效API时,此方法不仅可以减小树的大小,而且可以提高分类性能。对大型恶意软件数据集的穷举实验表明,该方法明显超出了现有的标准决策规则,支持了具有完整数据的基于矢量机的模板方法,并提供了更好的统计适用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号