首页> 外文学位 >Source code retrieval from large software libraries for automatic bug localization.
【24h】

Source code retrieval from large software libraries for automatic bug localization.

机译:从大型软件库中检索源代码以进行自动错误定位。

获取原文
获取原文并翻译 | 示例

摘要

This dissertation advances the state-of-the-art in information retrieval (IR) based approaches to automatic bug localization in software. In an IR-based approach, one first creates a search engine using a probabilistic or a deterministic model for the files in a software library. Subsequently, a bug report is treated as a query to the search engine for retrieving the files relevant to the bug. With regard to the new work presented, we first demonstrate the importance of taking version histories of the files into account for achieving significant improvements in the precision with which the files related to a bug are located. This is motivated by the realization that the files that have not changed in a long time are likely to have ``stabilized" and are therefore less likely to contain bugs. Subsequently, we look at the difficulties created by the fact that developers frequently use abbreviations and concatenations that are not likely to be familiar to someone trying to locate the files related to a bug. We show how an initial query can be automatically reformulated to include the relevant actual terms in the files by an analysis of the files retrieved in response to the original query for terms that are proximal to the original query terms. The last part of this dissertation generalizes our term-proximity based work by using Markov Random Fields (MRF) to model the inter-term dependencies in a query vis-a-vis the files. Our MRF work redresses one of the major defects of the most commonly used modeling approaches in IR, which is the loss of all inter-term relationships in the documents.
机译:本文提出了基于信息检索(IR)的最新技术,以实现软件中的错误自动定位。在基于IR的方法中,首先要使用概率或确定性模型为软件库中的文件创建搜索引擎。随后,将错误报告视为对搜索引擎的查询,以检索与该错误相关的文件。关于提出的新工作,我们首先演示考虑文件版本历史的重要性,以实现与错误相关的文件的定位精度的显着提高。这是由于认识到长时间未更改的文件很可能已“稳定”,因此不太可能包含错误,因此,我们来看看开发人员经常使用缩写的事实所带来的困难。以及试图查找与错误相关的文件的人可能不太熟悉的串联。我们展示了如何通过分析响应于以下内容而检索到的文件来自动重新构造初始查询,以在文件中包含相关的实际术语:本文的最后一部分通过使用马尔可夫随机场(MRF)对查询中的词间依存关系进行建模,概括了我们基于术语接近性的工作我们的MRF工作解决了IR中最常用的建模方法的主要缺陷之一,那就是丢失了文档中的所有内部关系。

著录项

  • 作者

    Sisman, Bunyamin.;

  • 作者单位

    Purdue University.;

  • 授予单位 Purdue University.;
  • 学科 Engineering Computer.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 139 p.
  • 总页数 139
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:41:08

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号