首页> 外文学位 >Leveraging code comments to improve software reliability.
【24h】

Leveraging code comments to improve software reliability.

机译:利用代码注释来提高软件可靠性。

获取原文
获取原文并翻译 | 示例

摘要

Commenting source code has long been a common practice in software development. This thesis, consisting of three pieces of work, made novel use of the code comments written in natural language to improve software reliability. Our solution combines Natural Language Processing (NLP), Machine Learning, Statistics, and Program Analysis techniques to achieve this goal.;First, innovations from multiple directions have been proposed to improve software reliability. Unfortunately, many of the innovations are not fully exploited by programmers. To bridge the gap, we propose a new approach, cComment, to "listen" to thousands of programmers by studying their programming comments. Since comments express programmers' assumptions and intention, comments can reveal programmers' needs. These programmers' needs provide guidance (1) for language/tool designers on where they should develop new techniques or enhance the usability of existing ones, and (2) for programmers on what problems are most pervasive and important so that they should take initiatives to adopt some existing tools or language extensions. We studied 1050 comments randomly sampled from the latest versions of Linux, FreeBSD, and OpenSolaris at the time of writing. We found that 52.6% of these comments could be leveraged by existing or to-be-proposed tools for improving reliability. Our findings include: (1) many comments describe code relationships, code evolutions, or the usage and meaning of integers and integer macros, (2) a significant amount of comments could be expressed by existing annotation languages, and (3) many comments express synchronization related concerns but are not well supported by annotation languages.;Second, compared to source code, comments are more direct, descriptive and easy-to-understand. Comments and source code provide relatively redundant and independent information regarding a program's semantic behavior. As software evolves, they can easily grow out-of-sync, indicating two problems: (1) bugs - the source code does not follow the assumptions and requirements specified by correct program comments; (2) bad comments - comments that are inconsistent with correct code, which can mislead programmers to introduce bugs in subsequent versions. Unfortunately, as most comments are written in natural language, no solution has been proposed to automatically analyze comments and detect inconsistencies between comments and source code. iComment took the first step in automatically analyzing comments written in natural language to extract implicit program rules and use these rules to automatically detect inconsistencies between comments and source code, indicating either bugs or bad comments. We evaluate iComment on four large code bases: Linux, Mozilla, Wine and Apache. Our experimental results show that iComment automatically extracts 1832 rules from comments with 90.8-100% accuracy and detects 60 comment-code inconsistencies, 33 new bugs and 27 bad comments, in the latest versions of the four programs when the study was conducted. Nineteen of them (12 bugs and 7 bad comments) have already been confirmed by the corresponding developers while the others are currently being analyzed by the developers.;Lastly, we proposed and implemented aComment to detect operating system concurrency bugs and handle the complex interaction between interrupts and lock. Specifically, we designed a new type of interrupt related annotations, and semi-automatically generated 96,821 such annotations for the Linux kernel. These annotations were automatically propagated from 246 seed annotations, directly inferred from comments and code assertions. By extracting annotations from both comments and code, we are able to extract more annotations than using a single source as only a small number (6) of the annotations can be extracted from both sources. These annotations were used to check against source code to detect software bugs, and 9 bugs were detected from the latest version of the Linux kernel at the time of writing.
机译:注释源代码长期以来一直是软件开发中的普遍做法。本论文由三部分组成,新颖地使用了以自然语言编写的代码注释,以提高软件的可靠性。我们的解决方案结合了自然语言处理(NLP),机器学习,统计和程序分析技术来实现这一目标。首先,从多个方向提出了创新以提高软件的可靠性。不幸的是,许多创新没有被程序员充分利用。为了弥合差距,我们提出了一种新的方法cComment,通过研究他们的编程注释来“监听”成千上万的程序员。由于注释表达了程序员的假设和意图,因此注释可以揭示程序员的需求。这些程序员的需求为(1)语言/工具设计人员提供了开发新技术或增强现有技术可用性的指南,以及(2)为程序员提供了最普遍和最重要的问题,以便他们应采取行动。采用一些现有的工具或语言扩展。在撰写本文时,我们研究了从最新版本的Linux,FreeBSD和OpenSolaris中随机抽取的1050条注释。我们发现这些注释中的52.6%可以被现有或拟议中的工具用来提高可靠性。我们的发现包括:(1)许多注释描述了代码关系,代码演变或整数和整数宏的用法和含义,(2)现有注释语言可以表达大量注释,并且(3)许多注释表示同步相关的问题,但注释语言没有很好地支持。其次,与源代码相比,注释更直接,更具描述性且易于理解。注释和源代码提供了有关程序语义行为的相对冗余和独立的信息。随着软件的发展,它们很容易变得不同步,从而出现两个问题:(1)错误-源代码未遵循正确程序注释所指定的假设和要求; (2)错误的注释-与正确的代码不一致的注释,这可能会误导程序员在后续版本中引入错误。不幸的是,由于大多数注释是用自然语言编写的,因此没有提出任何解决方案来自动分析注释并检测注释与源代码之间的不一致。 iComment迈出了第一步,即自动分析以自然语言编写的注释,以提取隐式程序规则,并使用这些规则自动检测注释与源代码之间的不一致,以指示错误或不良注释。我们基于四个大型代码库评估iComment:Linux,Mozilla,Wine和Apache。我们的实验结果表明,在进行研究时,iComment在四个程序的最新版本中自动从注释中提取1832条规则,准确率为90.8-100%,并检测到60个注释代码不一致,33个新错误和27个不良注释。相应的开发人员已经确认了其中的19个(12个错误和7个错误的注释),而其他的当前正在由开发人员进行分析。最后,我们提出并实现了一个注释,以检测操作系统并发错误并处理之间的复杂交互。中断并锁定。具体来说,我们设计了一种新型的与中断相关的注释,并为Linux内核半自动生成了96,821个此类注释。这些注释是从246个种子注释中自动传播的,这些注释是从注释和代码断言直接推断出来的。通过从注释和代码中提取注释,与使用单个来源相比,我们能够提取更多注释,因为从这两个来源中只能提取少量(6)注释。这些批注用于检查源代码以检测软件错误,并且在撰写本文时从最新版本的Linux内核中检测到9个错误。

著录项

  • 作者

    Tan, Lin.;

  • 作者单位

    University of Illinois at Urbana-Champaign.;

  • 授予单位 University of Illinois at Urbana-Champaign.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 107 p.
  • 总页数 107
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号