首页> 外文会议> >An Empirical Study of Data Mining Code Defect Patterns in Large Software Repositories
【24h】

An Empirical Study of Data Mining Code Defect Patterns in Large Software Repositories

机译:大型软件存储库中数据挖掘代码缺陷模式的实证研究

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

There has been a growing interest in mining software code defect patterns and using this knowledge to identify potential problems [2, 8-15]. To understand the benefits of such methods, we applied them to several large software repositories. We learned the effectiveness and the limitations of applying these methods, These methods are called "static analysis" as they analyze the code for defects without execution. They are different from the traditional static analysis as they apply data mining in the analysis, while the traditional static analysis uses program analysis, such as data flow analysis and control flow analysis. There is a trend to combine the data mining methods and program analysis techniques to gain more effective results.rnMany tools are available to detect software defects. But if the tools have no knowledge about what to check they can't find defects [5]. There are common defect patterns such as buffer overflow, null pointer dereference and several tools address these patterns such as Purify [19], FindBugs [20]. However, application specific defects can be difficult to find, probably because there are few common patterns among different applications. Therefore some approaches, such as DynaMine [2], try to explore the patterns of specific applications automatically, while others try to provide description based methods to describe these patterns as assertion statements. Examples of tools and methods that fall into this later category are contract programming, AOP (Aspect Oriented Programming), PQL [3], and Metal [4], The automatic approaches such as DynaMine may have difficulty in offering a good solution to explore complex patterns. The description based approaches such as PQL are often powerful at describing patterns, but they require manually constructing the specifications. Such tasks can be overwhelming [5].rnIn this paper, we dive into industry-level projects such as Harmony [16] an open source virtual machine to analyze and summarize their defect patterns using different pattern analysis methods. We gain some insights into the classifications of common code defect patterns. We make use of data mining methods to extract usage patterns automatically from the source code of different projects. We also look into the issues tracking system and revision history to analyze and summarize patterns. We will apply these tools and methods to detect pattern-related defects in new source code as well. Several methods are evaluated for their effectiveness to detect defect patterns.rnThe contributions of this paper are: (1) an empirical study to evaluate the effectiveness of data mining software code defects using a few large software repositories, and (2) insights into the characteristics of the software systems and the common code defect patterns.
机译:人们对挖掘软件代码缺陷模式以及使用此知识来识别潜在问题的兴趣日益增长[2,8-15]。为了了解此类方法的优势,我们将其应用于了几个大型软件存储库。我们了解了应用这些方法的有效性和局限性。这些方法称为“静态分析”,因为它们无需执行即可分析代码中的缺陷。它们与传统的静态分析不同,因为它们在分析中应用了数据挖掘,而传统的静态分析则使用程序分析,例如数据流分析和控制流分析。趋势是将数据挖掘方法和程序分析技术相结合以获得更有效的结果。rn许多工具可用于检测软件缺陷。但是,如果工具不知道要检查什么,它们就找不到缺陷[5]。有常见的缺陷模式,例如缓冲区溢出,空指针取消引用,还有一些工具可以解决这些模式,例如Purify [19],FindBugs [20]。但是,可能很难找到特定于应用程序的缺陷,这可能是因为不同应用程序之间的共同模式很少。因此,某些方法(例如DynaMine [2])尝试自动探索特定应用程序的模式,而其他方法则尝试提供基于描述的方法来将这些模式描述为断言语句。属于此类的工具和方法的示例包括合同编程,AOP(面向方面​​的编程),PQL [3]和Metal [4]。DynaMine之类的自动方法可能难以为探索复杂问题提供良好的解决方案模式。基于描述的方法(例如PQL)通常在描述模式方面功能强大,但是它们需要手动构建规范。这样的任务可能不堪重负[5]。在本文中,我们将深入研究诸如开源软件虚拟机Harmony [16]之类的行业级项目,以使用不同的模式分析方法来分析和总结其缺陷模式。我们获得了对常见代码缺陷模式分类的一些见解。我们利用数据挖掘方法从不同项目的源代码中自动提取使用模式。我们还将研究问题跟踪系统和修订历史记录,以分析和总结模式。我们还将应用这些工具和方法来检测新源代码中与模式相关的缺陷。评估了几种方法来检测缺陷模式的有效性。rn本文的贡献是:(1)使用一些大型软件存储库评估数据挖掘软件代码缺陷有效性的实证研究,以及(2)对特征的见解软件系统和常见的代码缺陷模式。

著录项

  • 来源
    《》|2009年|295-306|共12页
  • 会议地点 Portland OR(US);Portland OR(US);Portland OR(US)
  • 作者单位

    Software and Services Group, Intel Corporation;

    Software and Services Group, Intel Corporation;

    Software and Services Group, Intel Corporation;

    Software and Services Group, Intel Corporation;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 计算机软件;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号