首页> 外文期刊>Empirical Software Engineering >Addressing problems with replicability and validity of repository mining studies through a smart data platform
【24h】

Addressing problems with replicability and validity of repository mining studies through a smart data platform

机译:通过智能数据平台解决存储库挖掘研究的可重复性和有效性问题

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

The usage of empirical methods has grown common in software engineering. This trend spawned hundreds of publications, whose results are helping to understand and improve the software development process. Due to the data-driven nature of this venue of investigation, we identified several problems within the current state-of-the-art that pose a threat to the replicability and validity of approaches. The heavy re-use of data sets in many studies may invalidate the results in case problems with the data itself are identified. Moreover, for many studies data and/or the implementations are not available, which hinders a replication of the results and, thereby, decreases the comparability between studies. Furthermore, many studies use small data sets, which comprise of less than 10 projects. This poses a threat especially to the external validity of these studies. Even if all information about the studies is available, the diversity of the used tooling can make their replication even then very hard. Within this paper, we discuss a potential solution to these problems through a cloud-based platform that integrates data collection and analytics. We created SmartSHARK, which implements our approach. Using SmartSHARK, we collected data from several projects and created different analytic examples. Within this article, we present SmartSHARK and discuss our experiences regarding the use of it and the mentioned problems. Additionally, we show how we have addressed the issues that we have identified during our work with SmartSHARK.
机译:在软件工程中,经验方法的使用已经越来越普遍。这种趋势催生了数百种出版物,其成果有助于理解和改进软件开发过程。由于此调查地点的数据驱动性质,我们确定了当前最新技术中的一些问题,这些问题对方法的可重复性和有效性构成了威胁。如果发现数据本身存在问题,在许多研究中大量使用数据集可能会使结果无效。而且,对于许多研究而言,数据和/或实施方式不可用,这阻碍了结果的重复,从而降低了研究之间的可比性。此外,许多研究使用小的数据集,其中包含少于10个项目。这尤其对这些研究的外部有效性构成威胁。即使可以获得有关研究的所有信息,所使用工具的多样性也会使它们的复制非常困难。在本文中,我们将讨论通过集成数据收集和分析的基于云的平台解决这些问题的潜在方法。我们创建了SmartSHARK,它实现了我们的方法。使用SmartSHARK,我们从多个项目中收集了数据并创建了不同的分析示例。在本文中,我们介绍了SmartSHARK并讨论了有关使用它和提到的问题的经验。此外,我们还将展示如何解决在与SmartSHARK合作过程中发现的问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号