首页> 外文会议>International conference on Software engineering >Empirical investigation of a novel approach to check the integrity of software engineering measuring processes (poster session)
【24h】

Empirical investigation of a novel approach to check the integrity of software engineering measuring processes (poster session)

机译:对一种检查软件工程测量过程完整性的新颖方法的实证研究(发布会)

获取原文

摘要

This distribution is counter-intuitive for at least two reasons. First it would seem "obvious" that the numbers drawn from a list generated from widely different arbitrary processes would have roughly equally probabilities for the digits 1 and 9 to be first digits. This is not normally the case. If the list of numbers does not have artificial limits, or include invented numbers such as postal codes, then approximately 30% of the numbers will have 1 as their first digit, but only 5% will have 9 as their first digit. Deviations from the expected Benford Distribution indicate the presence of some special characteristic of the data. The second, more theoretically challenging, problem is: What is the underlying property associated with so many widely different processes which generates lists of numbers that follow Benford's Law?

We have conducted an empirical investigation to determine under what circumstances various software metrics follow Benford's Law, and whether any special characteristics, or irregularities, in the data can be uncovered if the data are found not to follow the law. The more tricky problem of understanding why the list of metrics might follow Benford's Law is left to another study.

Lists were form from three software metrics extracted from 100 public domain industrial Java Projects. These metrics were Lines of Code (LOC), Fan-Out (FO) and McCabe Cyclomatic Complexity (MCC). Given that a Benford's Law analysis requires a list of considerable length, the data were divided into two groups. The first groups was from projects containing more than 100 files. This was intended as the "control group" and what was expected to follow Benford's Law if that Law was applicable for the analysis of software engineering metrics. To study the sensitivity of the digital analysis technique to project size, projects with a smaller number of files were compared to the control group.

The empirical results indicate that the first digits of numbers in lists of LOC metrics extracted from the projects closely followed the probabilities predicted by Benford's Law than an "equal probability of occurrence" suggested by intuitive reasoning. This was shown using both qualitative and quantitative measures. The FO and MCC metrics did not follow the standard Benford's Law as well as did the LOC metrics. This is because the FO and MCC lists contain a significant number of numbers less than 10 and follow a different first digit distribution. Further investigation of the digital analysis technique is necessary to evaluate the applicability of Benford's Law in the total context of Software Metrics.

机译:

由于至少两个原因,此分布是违反直觉的。首先,似乎“显而易见”的是,从由广泛不同的任意过程生成的列表中得出的数字对于数字1和9成为第一位数字具有大致相同的概率。通常情况并非如此。如果数字列表没有人为限制,或者包括诸如邮政编码的发明数字,那么大约30%的数字的第一位数字为1,但是只有5%的第一位数字为9。与预期的本福德分布的偏差表明存在某些特殊数据特征。第二个在理论上更具挑战性的问题是:与如此众多不同的过程相关联的潜在属性是什么,这些过程生成遵循本福德定律的数字列表?

我们进行了一项实证研究,以确定各种软件指标在什么情况下遵循本福德定律,如果发现数据不符合法律,则是否可以发现数据中的任何特殊特征或不规则性。理解为什么指标列表可能遵循本福德定律的问题更加棘手,这留给另一项研究。

列表是从从100个公共领域工业Java项目中提取的三个软件指标形成的。这些度量标准是代码行(LOC),扇出(FO)和McCabe循环复杂度(MCC)。鉴于本福德定律分析需要一个相当长的列表,因此将数据分为两组。第一组来自包含100多个文件的项目。这原本是“控制组”,如果该法律适用于软件工程指标的分析,则应遵循该法律。为了研究数字分析技术对项目规模的敏感性,将文件数量较少的项目与对照组进行了比较。

实证结果表明,从项目中提取的LOC指标列表中的数字的第一位数与本福德定律预测的概率密切相关,而与直观推理所建议的“发生概率相等”密切相关。使用定性和定量方法均表明了这一点。 FO和MCC指标以及LOC指标均未遵循标准的本福德定律。这是因为FO和MCC列表包含大量小于10的数字,并且遵循不同的第一位数字分布。在软件度量的整体背景下,有必要进一步研究数字分析技术,以评估本福德定律的适用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号