A fast malware feature selection approach using a hybrid of multi-linear and stepwise binary logistic regression

Shamsul Huda; Jemal Abawajy; Mali Abdollahian; Rafiqul Islam; John Yearwood

首页> 外文期刊>Concurrency, practice and experience >A fast malware feature selection approach using a hybrid of multi-linear and stepwise binary logistic regression

【24h】

A fast malware feature selection approach using a hybrid of multi-linear and stepwise binary logistic regression

机译：快速的恶意软件功能选择方法，使用了多线性和逐步二进制逻辑回归的混合

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Malware replicates itself and produces offspring with the same characteristics but different signatures by using code obfuscation techniques. Current generation anti-virus engines employ a signature-template type detection approach where malware can easily evade existing signatures in the database. This reduces the capability of current anti-virus engines in detecting malware. In this paper, we propose a stepwise binary logistic regression-based dimensionality reduction techniques for malware detection using application program interface (API) call statistics. Finding the most significant malware feature using traditional wrapper-based approaches takes an exponential complexity of the dimension (m) of the dataset with a bruteforce search strategies and order of (m-1) complexity with a backward elimination filter heuristics. The novelty of the proposed approach is that it finds the worst case computational complexity which is less than order of (m-1). The proposed approach uses multi-linear regression and the p- value of each individual API feature for selection of the most uncorrelated and significant features in order to reduce the dimensionality of the large malware data and to ensure the absence of multi-collinearity. The stepwise logistic regression approach is then employed to test the significance of the individual malware feature based on their corresponding Wald statistic and to construct the binary decision the model. When the selected most significant APIs are used in a decision rule generation systems, this approach not only reduces the tree size but also improves classification performance. Exhaustive experiments on a large malware data set show that the proposed approach clearly exceeds the existing standard decision rule, support vector machine-based template approach with complete data and provides a better statistical fitness.

机译：恶意软件通过使用代码混淆技术自我复制并产生具有相同特征但签名不同的后代。当前一代的防病毒引擎采用签名模板类型检测方法，在这种方法中，恶意软件可以轻松逃避数据库中现有的签名。这降低了当前防病毒引擎检测恶意软件的能力。在本文中，我们为使用应用程序接口（API）调用统计信息的恶意软件检测提出了一种基于逐步二进制逻辑回归的降维技术。使用传统的基于包装器的方法来查找最重要的恶意软件功能，需要使用蛮力搜索策略来处理数据集维度（m）的指数复杂性，并使用向后消除过滤器启发式方法来解决（m-1）复杂度的顺序。所提出的方法的新颖性在于，它发现最坏情况下的计算复杂度小于（m-1）的数量级。所提出的方法使用多线性回归和每个单独API功能的p值来选择最不相关和最重要的功能，以便减少大型恶意软件数据的维数并确保不存在多共线性。然后，采用逐步逻辑回归方法来基于各个恶意软件特征的相应Wald统计量测试各个恶意软件特征的重要性，并构建模型的二元决策。当在决策规则生成系统中使用所选的最高有效API时，此方法不仅可以减小树的大小，而且可以提高分类性能。对大型恶意软件数据集的穷举实验表明，该方法明显超出了现有的标准决策规则，支持了具有完整数据的基于矢量机的模板方法，并提供了更好的统计适用性。

著录项

来源
《Concurrency, practice and experience》 |2017年第23期|e3912.1-e3912.18|共18页
作者
Shamsul Huda; Jemal Abawajy; Mali Abdollahian; Rafiqul Islam; John Yearwood;
展开▼
作者单位

School of Information Technology, Deakin University, Geelong, Vic 3216, Australia;

School of Information Technology, Deakin University, Geelong, Vic 3216, Australia;

School of Mathematical and Geospatial Sciences, RMIT University, Melbourne, Vic 3000, Australia;

Charles Sturt University, Albury, NSW 2640, Australia;

School of Information Technology, Deakin University, Geelong, Vic 3216, Australia;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
malware detection; binary logistic regression; stepwise regression; API call statistics; AIC criteria; chi-square;

机译：恶意软件检测;二元逻辑回归逐步回归API调用统计信息;AIC标准;卡方;

相似文献

外文文献
中文文献
专利

1. A Hybrid Approach of Stepwise Regression, Logistic Regression, Support Vector Machine, and Decision Tree for Forecasting Fraudulent Financial Statements [J] . SuduanChen, Yeong-Jia JamesGoo, Zone-DeShen ScientificWorldJournal . 2014,第3期

机译：一种逐步回归，逻辑回归，支持向量机和决策树的混合方法，用于预测欺诈性财务报表
2. Hybrid approach of improved binary particle swarm optimization and shuffled frog leaping for feature selection [J] . Rajamohana S. P., Umamaheswari K. Computers and Electrical Engineering . 2018,第期

机译：改进二元粒子群优化和洗牌青蛙的混合方法跳跃特色选择
3. Hybrid Binary Imperialist Competition Algorithm and Tabu Search Approach for Feature Selection Using Gene Expression Data [J] . Shuaiqun Wang, Aorigele, Wei Kong, BioMed research international . 2016,第12期

机译：混合二进制帝国主义竞争算法和禁忌搜索方法，用于使用基因表达数据选择的特征选择
4. A binary decision tree classifier implementing logistic regression as a feature selection and classification method and its comparison with maximum likelihood [C] . Denis Altieri, Bittencourt Helio Radke, de Oliveira Moraes, IEEE International Geoscience and Remote Sensing Symposium . 2007

机译：二进制决策树分类器实现Logistic回归作为特征选择和分类方法及其与最大可能性的比较
5. Effects of model selection on the coverage probability of confidence intervals in binary-response logistic regression. [D] . Zhang, Dongquan. 2008

机译：模型选择对二元响应逻辑回归中置信区间覆盖概率的影响。
6. A Hybrid Approach of Stepwise Regression Logistic Regression Support Vector Machine and Decision Tree for Forecasting Fraudulent Financial Statements [O] . Suduan Chen, Yeong-Jia James Goo, Zone-De Shen -1

机译：逐步欺诈逻辑回归支持向量机和决策树的混合方法用于预测欺诈性财务报表
7. Feature Selection and Cancer Classification via Sparse Logistic Regression with the Hybrid L1/2 +2 Regularization. [O] . Hai-Hui Huang, Xiao-Ying Liu, Yong Liang 2016

机译：基于稀疏Logistic回归和混合L1 / 2 +2正则化的特征选择和癌症分类。

A fast malware feature selection approach using a hybrid of multi-linear and stepwise binary logistic regression

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅