Identification of Security Related Bug Reports via Text Mining Using Supervised and Unsupervised Classification

机译：通过使用监督和无监督的分类，通过文本挖掘确定安全相关错误报告

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

While many prior works used text mining for automating different tasks related to software bug reports, few works considered the security aspects. This paper is focused on automated classification of software bug reports to security and not-security related, using both supervised and unsupervised approaches. For both approaches, three types of feature vectors are used. For supervised learning, we experiment with multiple classifiers and training sets with different sizes. Furthermore, we propose a novel unsupervised approach based on anomaly detection. The evaluation is based on three NASA datasets. The results showed that supervised classification is affected more by the learning algorithms than by feature vectors and training only on 25% of the data provides as good results as training on 90% of the data. The supervised learning slightly outperforms the unsupervised learning, at the expense of labeling the training set. In general, datasets with more security information lead to better performance.

机译：虽然许多先前的作品使用文本挖掘用于自动化与软件错误报告相关的不同任务，但很少有效地考虑了安全方面。本文的专注于使用监督和无监督和无监督的方法对安全性和不安全性的软件错误的自动分类。对于两种方法，使用三种类型的特征向量。对于监督学习，我们尝试多种分类器和具有不同尺寸的培训集。此外，我们提出了一种基于异常检测的新型无人监督方法。评估基于三个NASA数据集。结果表明，监督分类受到学习算法的影响，而不是通过特征向量，并且仅在25 ％的数据中训练提供了90 ％数据的培训。监督学习略高于无监督的学习，以牺牲培训集为代价。通常，具有更多安全信息的数据集导致性能更好。

著录项

来源
《IEEE International Conference on Software Quality, Reliability, and Security》|2018年|515p|共12页
会议地点
作者
Katerina Goseva-Popstojanova; Jacob Tyo;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
Computer bugs; Security; Software; Training; Text mining; Machine learning algorithms; Measurement;

机译：计算机错误;安全;软件;培训;文本挖掘;机器学习算法;测量;

相似文献

外文文献
中文文献
专利

1. Combining text mining and data mining for bug report classification [J] . Yu Zhou, Yanxiang Tong, Ruihang Gu, Journal of Software Maintenance and Evolution . 2016,第3期

机译：结合文本挖掘和数据挖掘进行错误报告分类
2. Predicting the Severity of Open Source Bug Reports Using Unsupervised and Supervised Techniques [J] . Pushpalatha M N, Mrunalini M International journal of open source software & processes . 2019,第1期

机译：使用无监督和受监督的技术预测开源错误报告的严重性
3. Helmholtz principle based supervised and unsupervised feature selection methods for text mining [J] . Melike Tutkan, Murat Can Ganiz, Selim Akyokus Information Processing & Management . 2016,第5期

机译：基于亥姆霍兹原理的文本挖掘中有监督和无监督特征选择方法
4. Identification of Security Related Bug Reports via Text Mining Using Supervised and Unsupervised Classification [C] . Katerina Goseva-Popstojanova, Jacob Tyo 2018 IEEE 18th International Conference on Software Quality, Reliability, and Security . 2018

机译：使用监督和无监督分类通过文本挖掘识别与安全相关的错误报告
5. Theoretical analysis of classification under CCC-Noise and its application to semi-supervised text mining. [D] . Bi, Yingtao. 2008

机译：CCC噪声下分类的理论分析及其在半监督文本挖掘中的应用。
6. Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection [O] . Taxiarchis Botsis, Michael D Nguyen, Emily Jane Woo, 2011

机译：疫苗不良事件报告系统的文本挖掘：使用信息特征选择进行医学文本分类
7. Helmholtz principle based supervised and unsupervised feature selection methods for text mining [O] . Melike Tutkan, Murat Can Ganiz, Selim Akyokuş 2016

机译：亥姆霍兹原理基于文本挖掘的监督和无监督的特征选择方法

Identification of Security Related Bug Reports via Text Mining Using Supervised and Unsupervised Classification

摘要

著录项

相似文献

相关主题

期刊订阅