The impact factors on the performance of machine learning-based vulnerability detection: A comparative study

Wei Zheng; Jialiang Gao; Xiaoxue Wu; Fengyu Liu; Yuxing Xun; Guoliang Liu; Xiang Chen

首页> 外文期刊>The Journal of Systems and Software >The impact factors on the performance of machine learning-based vulnerability detection: A comparative study

【24h】

The impact factors on the performance of machine learning-based vulnerability detection: A comparative study

机译：基于机器学习漏洞检测的影响因素：比较研究

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Machine learning-based Vulnerability detection is an active research topic in software security. Different traditional machine learning-based and deep learning-based vulnerability detection methods have been proposed. To our best knowledge, we are the first to identify four impact factors and conduct a comparative study to investigate the performance influence of these factors. In particular, the quality of datasets, classification models and vectorization methods can directly affect the detection performance, in contrast function/variable name replacement can affect the features of vulnerability detection and indirectly affect the performance. We collect three different vulnerability code datasets from two various sources (i.e., NVD and SARD). These datasets can correspond to different types of vulnerabilities. Moreover, we extract and analyze the features of vulnerability code datasets to explain some experimental results. Our findings based on the experimental results can be summarized as follows: (1) Deep learning models can achieve better performance than traditional machine learning models. Of all the models, BLSTM can achieve the best performance. (2) CountVectorizer can significantly improve the performance of traditional machine learning models. (3) Features generated by the random forest algorithm include system-related functions, syntax keywords, and user-defined names. Different vulnerability types and code sources will generate different features. (4) Datasets with user-defined variable and function name replacement will decrease the performance of vulnerability detection. (5) As the proportion of code from SARD increases, the performance of vulnerability detection will increase.

机译：基于机器学习的漏洞检测是软件安全的主动研究主题。已经提出了不同传统的基于机器学习和基于深度学习的漏洞检测方法。为了我们的最佳知识，我们是第一个识别四个影响因素，并进行比较研究以调查这些因素的绩效影响。特别地，数据集的质量，分类模型和矢量化方法可以直接影响检测性能，相比之下函数/变量名称替换可能会影响漏洞检测的功能和间接影响性能。我们从两个各种来源中收集三个不同的漏洞代码数据集（即，NVD和SARD）。这些数据集可以对应于不同类型的漏洞。此外，我们提取并分析漏洞代码数据集的特征来解释一些实验结果。我们的研究结果可以概括如下：（1）深度学习模型可以实现比传统机器学习模型更好的性能。在所有型号中，BLSTM可以达到最佳性能。（2）CountVectorizer可以显着提高传统机器学习模型的性能。（3）随机林算法生成的功能包括与系统相关的函数，语法关键字和用户定义的名称。不同的漏洞类型和代码源将生成不同的功能。（4）具有用户定义变量和功能名称替换的数据集将降低漏洞检测的性能。（5）随着SARD的代码比例增加，脆弱性检测的性能将增加。

著录项

来源
《The Journal of Systems and Software》 |2020年第10期|110659.1-110659.12|共12页
作者
Wei Zheng; Jialiang Gao; Xiaoxue Wu; Fengyu Liu; Yuxing Xun; Guoliang Liu; Xiang Chen;
展开▼
作者单位

School of Software Northwestern Polytechnical University Xi'An China Key Laboratory of Advanced Perception and Intelligent Control of High-end Equipment Ministry of Education Anhui Polytechnic University China National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology Northwestern Polytechnical University China;

School of Software Northwestern Polytechnical University Xi'An China;

School of Cyberspace Security Northwestern Polytechnical University Xi'An China;

School of Computer Science Northwestern Polytechnical University Xi'An China;

School of Software Northwestern Polytechnical University Xi'An China;

School of Software Northwestern Polytechnical University Xi'An China;

School of Information Science and Technology Nantong University Nantong China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Vulnerability detection; Machine learning; Comparative study; Deep learning; Feature extraction;

机译：漏洞检测;机器学习;比较研究;深度学习;特征提取;

相似文献

外文文献
中文文献
专利

1. A Meta-Analysis of Machine Learning-Based Science Assessments: Factors Impacting Machine-Human Score Agreements [J] . Zhai Xiaoming, Shi Lehong, Nehm Ross H. Journal of Science Education and Technology . 2021,第3期

机译：基于机器学习的科学评估的META分析：影响机器人体评分协议的因素
2. A Comparative Study of Using Various Machine Learning and Deep Learning-Based Fraud Detection Models For Universal Health Coverage Schemes [J] . Rohan Yashraj Gupta, Satya Sai Mudigonda, Pallav Kumar Baruah International Journal of Engineering Trends and Technology . 2021,第3期

机译：应用各种机器学习与基于深入学习的欺诈检测模型的比较研究
3. Machine Learning-Based Detection of Credit Card Fraud: A Comparative Study [J] . Zainab Khamees Alkhateeb, Abeer Tariq Maolood American journal of engineering and applied sciences . 2019,第4期

机译：基于机器学习的信用卡欺诈检测：一个比较研究
4. Factors Impacting the Effort Required to Fix Security Vulnerabilities An Industrial Case Study [C] . Lotfi ben Othmane, Golriz Chehrazi, Eric Bodden, International Conference on Information Security . 2015

机译：影响修复安全漏洞所需工作量的因素工业案例研究
5. The study of machine level capacity constraints and the impact on system performance using X-factor theory. [D] . Delp, Deana R. 2003

机译：使用X因子理论研究机器级别的容量约束及其对系统性能的影响。
6. Improving the Performance of Machine Learning-Based Network Intrusion Detection Systems on the UNSW-NB15 Dataset [O] . Soulaiman Moualla, Khaldoun Khorzom, Assef Jafar 2021

机译：在UNSW-NB15数据集上提高基于机器学习的网络入侵检测系统的性能
7. A Comparative Study of Deep Learning-Based Vulnerability Detection System [O] . Zhen Li, Deqing Zou, Jing Tang, 2019

机译：基于深度学习的脆弱性检测系统的比较研究

The impact factors on the performance of machine learning-based vulnerability detection: A comparative study

摘要

著录项

相似文献

相关主题

期刊订阅