...
首页> 外文期刊>The Journal of Systems and Software >The impact factors on the performance of machine learning-based vulnerability detection: A comparative study
【24h】

The impact factors on the performance of machine learning-based vulnerability detection: A comparative study

机译:基于机器学习漏洞检测的影响因素:比较研究

获取原文
获取原文并翻译 | 示例
           

摘要

Machine learning-based Vulnerability detection is an active research topic in software security. Different traditional machine learning-based and deep learning-based vulnerability detection methods have been proposed. To our best knowledge, we are the first to identify four impact factors and conduct a comparative study to investigate the performance influence of these factors. In particular, the quality of datasets, classification models and vectorization methods can directly affect the detection performance, in contrast function/variable name replacement can affect the features of vulnerability detection and indirectly affect the performance. We collect three different vulnerability code datasets from two various sources (i.e., NVD and SARD). These datasets can correspond to different types of vulnerabilities. Moreover, we extract and analyze the features of vulnerability code datasets to explain some experimental results. Our findings based on the experimental results can be summarized as follows: (1) Deep learning models can achieve better performance than traditional machine learning models. Of all the models, BLSTM can achieve the best performance. (2) CountVectorizer can significantly improve the performance of traditional machine learning models. (3) Features generated by the random forest algorithm include system-related functions, syntax keywords, and user-defined names. Different vulnerability types and code sources will generate different features. (4) Datasets with user-defined variable and function name replacement will decrease the performance of vulnerability detection. (5) As the proportion of code from SARD increases, the performance of vulnerability detection will increase.
机译:基于机器学习的漏洞检测是软件安全的主动研究主题。已经提出了不同传统的基于机器学习和基于深度学习的漏洞检测方法。为了我们的最佳知识,我们是第一个识别四个影响因素,并进行比较研究以调查这些因素的绩效影响。特别地,数据集的质量,分类模型和矢量化方法可以直接影响检测性能,相比之下函数/变量名称替换可能会影响漏洞检测的功能和间接影响性能。我们从两个各种来源中收集三个不同的漏洞代码数据集(即,NVD和SARD)。这些数据集可以对应于不同类型的漏洞。此外,我们提取并分析漏洞代码数据集的特征来解释一些实验结果。我们的研究结果可以概括如下:(1)深度学习模型可以实现比传统机器学习模型更好的性能。在所有型号中,BLSTM可以达到最佳性能。 (2)CountVectorizer可以显着提高传统机器学习模型的性能。 (3)随机林算法生成的功能包括与系统相关的函数,语法关键字和用户定义的名称。不同的漏洞类型和代码源将生成不同的功能。 (4)具有用户定义变量和功能名称替换的数据集将降低漏洞检测的性能。 (5)随着SARD的代码比例增加,脆弱性检测的性能将增加。

著录项

  • 来源
    《The Journal of Systems and Software》 |2020年第10期|110659.1-110659.12|共12页
  • 作者单位

    School of Software Northwestern Polytechnical University Xi'An China Key Laboratory of Advanced Perception and Intelligent Control of High-end Equipment Ministry of Education Anhui Polytechnic University China National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology Northwestern Polytechnical University China;

    School of Software Northwestern Polytechnical University Xi'An China;

    School of Cyberspace Security Northwestern Polytechnical University Xi'An China;

    School of Computer Science Northwestern Polytechnical University Xi'An China;

    School of Software Northwestern Polytechnical University Xi'An China;

    School of Software Northwestern Polytechnical University Xi'An China;

    School of Information Science and Technology Nantong University Nantong China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Vulnerability detection; Machine learning; Comparative study; Deep learning; Feature extraction;

    机译:漏洞检测;机器学习;比较研究;深度学习;特征提取;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号