Improving Software Productivity and Quality via Mining Source Code.

机译：通过挖掘源代码提高软件的生产率和质量。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The major goal of software development is to deliver high-quality software efficiently. To achieve this goal of delivering high-quality software efficiently, programmers often reuse existing frameworks or libraries, hereby referred to as libraries, instead of developing similar code artifacts from the scratch. However, programmers often face challenges in reusing existing libraries due to two major factors. First, many existing libraries are not well-documented. Even when such documentations exist, they are often outdated. Second, many existing libraries expose a large number of application programming interfaces (APIs), which represent interfaces through which libraries expose their functionalities. For example, the .NET base library provides nearly 10,000 API classes. Due to these two preceding factors, there exist three major problems that affect both software productivity and quality. First, programmers often spend more time in reusing existing libraries, thereby reducing software productivity. Second, programmers introduce defects while using APIs due to lack of proper knowledge on how to reuse those APIs. Third, existing white-box test generation techniques face challenges in effectively generating test inputs for the client code that reuses libraries.;To address these three preceding issues, in this dissertation, we propose a general framework, called WebMiner, that uses existing open source code available on the web by leveraging a code search engine. In particular, WebMiner infers usage specifications for API methods under analysis by automatically collecting relevant code examples from the open source code available on the web. WebMiner next applies data mining techniques on those collected code examples to identify common patterns, which represent likely usage of APIs, referred to as API usage specifications. The primary reason for identifying common patterns is based on the observation that majority of the programmers correctly adhere to API usage specifications and those common patterns are likely to represent the correct usage of APIs.;We further propose six approaches based on our general framework, where each approach focuses on a specific software engineering (SE) task such as detecting defects in an application under analysis. In particular, the first two approaches assist programmers in effectively reusing APIs provided by existing libraries. The next two approaches use mined API usage specifications as programming rules and detect defects in applications under analysis as deviations from the mined specifications. Finally, the last two approaches mine static and dynamic traces, respectively, for effectively generating test inputs that achieve high structural coverage of the code under test. We also propose another approach that addresses a major issue with mining-based approaches, which are not effective in scenarios where usage information is not available for the API methods under analysis or usage information is not sufficient to achieve the SE task under analysis.;Our empirical results show that the approaches developed based on our WebMiner framework effectively address the respective SE tasks handled by those approaches. In particular, our empirical results demonstrate the effectiveness of expanding the data scope of mining-based approaches to large open source code available on the web. Our results also show that our approaches address queries posted in developer forums and detect new defects that are not detected by existing related approaches, thereby improving both software productivity and quality.

机译：软件开发的主要目标是有效交付高质量的软件。为了实现有效交付高质量软件的目标，程序员经常重用现有的框架或库（以下称为库），而不是从头开始开发类似的代码工件。但是，由于两个主要因素，程序员在重用现有库时经常面临挑战。首先，许多现有的库没有完善的文档记录。即使存在此类文档，它们也经常过时。其次，许多现有的库公开了大量的应用程序编程接口（API），这些API表示接口，通过这些接口库可以公开其功能。例如，.NET基础库提供了将近10,000个API类。由于上述两个因素，存在三个同时影响软件生产率和质量的主要问题。首先，程序员经常将更多时间花在重用现有库上，从而降低了软件生产率。其次，由于缺乏有关如何重用这些API的适当知识，程序员在使用API时会引入缺陷。第三，现有的白盒测试生成技术在有效地为复用库的客户端代码生成测试输入方面面临着挑战。为了解决上述三个问题，本文提出了一个通用的框架WebMiner，该框架使用现有的开源利用代码搜索引擎在网络上提供可用的代码。特别是，WebMiner通过从Web上可用的开放源代码自动收集相关代码示例来推断正在分析的API方法的使用规范。 WebMiner接下来对那些收集的代码示例应用数据挖掘技术，以识别通用模式，这些通用模式表示可能的API使用，称为API使用规范。识别通用模式的主要原因是基于这样的观察：大多数程序员正确遵守API使用规范，而这些通用模式很可能代表API的正确使用。我们在通用框架的基础上进一步提出了六种方法，其中每种方法都专注于特定的软件工程（SE）任务，例如在分析中的应用程序中检测缺陷。特别地，前两种方法可帮助程序员有效地重用现有库提供的API。接下来的两种方法将开采的API使用规范用作编程规则，并分析正在分析的应用程序中的缺陷是否与开采的规范存在偏差。最后，最后两种方法分别挖掘静态和动态轨迹，以有效地生成测试输入，从而实现对被测代码的高度结构覆盖。我们还提出了另一种方法来解决基于挖掘的方法的主要问题，这种方法在以下情况下无效：在这种情况下，使用情况信息无法用于所分析的API方法，或者使用情况信息不足以实现所分析的SE任务。实证结果表明，基于我们的WebMiner框架开发的方法有效地解决了由这些方法处理的各个SE任务。特别是，我们的经验结果证明了将基于挖掘的方法的数据范围扩展到网络上可用的大型开放源代码的有效性。我们的结果还表明，我们的方法可以解决在开发人员论坛中发布的查询，并检测现有相关方法无法检测到的新缺陷，从而提高了软件生产率和质量。

著录项

作者
Thummalapenta, Suresh.;
展开▼
作者单位

North Carolina State University.;

展开▼
授予单位 North Carolina State University.;
学科 Computer Science.
学位 Ph.D.
年度 2011
页码 185 p.
总页数 185
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Data Mining Techniques for Software Quality Prediction in Open Source Software [J] . Marco Canaparo, Elisabetta Ronchieri EPJ Web of Conferences . 2019,第6期

机译：开源软件中软件质量预测的数据挖掘技术
2. Data Mining Techniques for Software Quality Prediction in Open Source Software [J] . Marco Canaparo, Elisabetta Ronchieri EPJ Web of Conferences . 2019,第2期

机译：开源软件中软件质量预测的数据挖掘技术
3. Mining Software History to Improve Software Maintenance Quality: A Case Study [J] . Tarvo Alexander IEEE Software . 2009,第1期

机译：挖掘软件历史以提高软件维护质量：一个案例研究
4. Data Mining Technique Effectiveness for Improving Software Productivity and Quality [C] . Kobayashi Toru 36th Annual IEEE International Computer Software and Applications Conference.;vol. 1.;Main Conference . 2012

机译：数据挖掘技术对提高软件生产率和质量的有效性
5. Engaging developers in open source software projects: Harnessing social and technical data mining to improve software development. [D] . Carlson, Patrick Eric. 2015

机译：使开发人员参与开源软件项目：利用社交和技术数据挖掘来改善软件开发。
6. An open source software for fast grid-based data-mining in spatial epidemiology (FGBASE) [O] . David M Baker, Alain-Jacques Valleron 2014

机译：一个开源软件用于空间流行病学中基于网格的快速数据挖掘（FGBASE）
7. Improving Software Reliability and Productivity via Mining Program Source Code [O] . Tao Xie, Mithun Acharya, Suresh Thummalapenta, 2008

机译：通过挖掘程序源代码提高软件的可靠性和生产率
8. Mining Program Source Code for Improving Software Quality. [R] . Xie, T. 2013

机译：提高软件质量的采矿程序源代码。

Improving Software Productivity and Quality via Mining Source Code.

摘要

著录项

相似文献

相关主题

期刊订阅