首页> 外文学位 >Improving automated requirements trace retrieval through term-based enhancement strategies.
【24h】

Improving automated requirements trace retrieval through term-based enhancement strategies.

机译:通过基于术语的增强策略来改进自动化需求跟踪检索。

获取原文
获取原文并翻译 | 示例

摘要

Requirements traceability is concerned with managing and documenting the life of requirements. Its primary goal is to support critical software development activities such as evaluating whether a generated software system satisfies the specified set of requirements, checking that all requirements have been implemented by the end of the lifecycle, and analyzing the impact of proposed changes on the system.;Various approaches for improving requirements traceability practices have been proposed in recent years. Automated traceability methods that utilize information retrieval (IR) techniques have been recognized to effectively support the trace generation and retrieval process. IR based approaches not only significantly reduce human effort involved in manual trace generation and maintenance, but also allow the analyst to perform tracing on an as-needed" basis.;The IR-based automated traceability tools typically retrieve a large number of potentially relevant traceability links between requirements and other software artifacts in order to return to the analyst as many true links as possible. As a result, the precision of the retrieval results is generally low and the analyst often needs to manually filter out a large amount of unwanted links. The low precision among the retrieved links consequently impacts the usefulness of the IR-based tools. The analyst's confidence in the effectiveness of the approach can be negatively affected both by the presence of a large number of incorrectly retrieved traces, and the number of true traces that are missed. In this thesis we present three enhancement strategies that aim to improve precision in trace retrieval results while still striving to retrieve a large number of traceability links. The three strategies are: (1) Query term coverage (TC). This strategy assumes that a software artifact sharing a larger proportion of distinct words with a requirement is more likely to be relevant to that requirement. This concept is defined as query term coverage (TC). A new approach is introduced to incorporate the TC factor into the basic IR model such that the relevance ranking for query-document pairs that share two or more distinct terms will be increased and the retrieval precision is improved. (2) Phrasing. The standard IR models generate similarity scores for links between a query and a document based on the distribution of single terms in the document collection. Several studies in the general IR area have shown phrases can provide a more accurate description of document content and therefore lead to improvement in retrieval [21, 23, 52]. This thesis therefore presents an approach using phrase detection to enhance the basic IR model and to improve its retrieval accuracy. (3) Utilizing a project glossary. Terms and phrases defined in the project glossary tend to capture the critical meaning of a project and therefore can be regarded as more meaningful for detecting relations between documents compared to other more general terms. A new enhancement technique is then introduced in this thesis that utilizes the information in the project glossary and increases the weights of terms and phrases included in the project glossary. This strategy aims at increasing the relevance ranking of documents containing glossary items and consequently at improving the retrieval precision.;The incorporation of these three enhancement strategies into the basic IR model, both individually and synergistically, is presented.;The work presented in this thesis supports the development and application of automated tracing tools. The three strategies share the same goal of improving precision in the retrieval results to address the low precision problem, which is a big concern associated with the IR-based tracing methods. Furthermore, the predictors for individual enhancement strategies presented in this thesis can be utilized to identify which strategy will be effective in the specific tracing tasks. These predictors can be adopted to define intelligent tracing tools that can automatically determine which enhancement strategy should be applied in order to achieve the best retrieval results on the basis of the metrics values. A tracing tool incorporating one or more of these methods is expected to achieve higher precision in the trace retrieval results than the basic IR model. Such improvement will not only reduce the analyst's effort of inspecting the retrieval results, but also increase his or her confidence in the accuracy of the tracing tool. (Abstract shortened by UMI.)
机译:需求可追溯性与管理和记录需求寿命有关。其主要目标是支持关键的软件开发活动,例如评估生成的软件系统是否满足指定的要求集,检查是否已在生命周期结束时实现了所有要求以及分析建议的更改对系统的影响。近年来,已经提出了各种改进需求可追溯性实践的方法。已经认识到利用信息检索(IR)技术的自动跟踪方法可以有效地支持跟踪的生成和检索过程。基于IR的方法不仅显着减少了人工跟踪生成和维护所需的人力,而且还使分析人员可以根据需要执行跟踪。”;基于IR的自动可跟踪性工具通常检索大量潜在相关的可跟踪性为了将尽可能多的真实链接返回给分析人员,需求和其他软件工件之间的链接会因此而产生,因此,检索结果的精度通常较低,分析人员通常需要手动过滤掉大量不需要的链接。因此,检索到的链接之间的低精度会影响基于IR的工具的实用性,同时存在大量错误检索的迹线和真实的迹线,会对分析师对方法有效性的信心产生负面影响。在本文中,我们提出了三种增强策略,旨在提高跟踪检索结果的精度。仍在努力检索大量的可追溯性链接。这三种策略是:(1)查询术语覆盖率(TC)。该策略假定,与需求共享较大比例的不同单词的软件工件更可能与该需求相关。此概念定义为查询术语覆盖率(TC)。引入了一种将TC因子合并到基本IR模型中的新方法,从而可以提高共享两个或多个不同术语的查询文档对的相关性排名,并提高检索精度。 (2)短语。标准的IR模型基于文档集中单个术语的分布,为查询和文档之间的链接生成相似性评分。在一般的IR领域中的一些研究表明,短语可以提供对文档内容的更准确的描述,因此可以提高检索效率[21,23,52]。因此,本文提出了一种使用短语检测的方法来增强基本的红外模型并提高其检索精度。 (3)使用项目词汇表。项目词汇表中定义的术语和短语倾向于捕获项目的关键含义,因此与其他更通用的术语相比,可以认为对于检测文档之间的关系更有意义。然后,本文引入了一种新的增强技术,该技术可以利用项目词汇表中的信息,并增加项目词汇表中包含的术语和短语的权重。该策略旨在提高包含词汇表项的文档的相关性排名,从而提高检索精度。;提出了将这三种增强策略分别和协同地结合到基本的IR模型中。支持自动跟踪工具的开发和应用。这三种策略在提高检索结果的精度以解决低精度问题方面有着相同的目标,这是与基于IR的跟踪方法相关的一个大问题。此外,本文提出的个体增强策略的预测因子可用于确定哪种策略在特定的跟踪任务中将是有效的。可以采用这些预测变量来定义智能跟踪工具,这些工具可以自动确定应应用哪种增强策略,以便基于度量值获得最佳检索结果。与基本的IR模型相比,结合了这些方法中的一种或多种的跟踪工具有望在跟踪检索结果中实现更高的精度。这种改进不仅会减少分析人员检查检索结果的工作量,而且会增加他或她对跟踪工具准确性的信心。 (摘要由UMI缩短。)

著录项

  • 作者

    Zou, Xuchang.;

  • 作者单位

    DePaul University.;

  • 授予单位 DePaul University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 131 p.
  • 总页数 131
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号