首页> 外文会议>IEEE International Conference on Software Analysis, Evolution and Reengineering >On the Co-evolution of ML Pipelines and Source Code - Empirical Study of DVC Projects
【24h】

On the Co-evolution of ML Pipelines and Source Code - Empirical Study of DVC Projects

机译:论ML管道和源代码的共同演变 - DVC项目的实证研究

获取原文

摘要

The growing popularity of machine learning (ML) applications has led to the introduction of software engineering tools such as Data Versioning Control (DVC), MLFlow and Pachyderm that enable versioning ML data, models, pipelines and model evaluation metrics. Since these versioned ML artifacts need to be synchronized not only with each other, but also with the source and test code of the software applications into which the models are integrated, prior findings on co-evolution and coupling between software artifacts might need to be revisited. Hence, in order to understand the degree of coupling between ML-related and other software artifacts, as well as the adoption of ML versioning features, this paper empirically studies the usage of DVC in 391 Github projects, 25 of which in detail. Our results show that more than half of the DVC files in a project are changed at least once every one-tenth of the project’s lifetime. Furthermore, we observe a tight coupling between DVC files and other artifacts, with 1/4 pull requests changing source code and 1/2 pull requests changing tests requiring a change to DVC files. As additional evidence of the observed complexity associated with adopting ML-related software engineering tools like DVC, an average of 78% of the studied projects showed a non-constant trend in pipeline complexity.
机译:机器学习(ML)应用程序的日益普及导致引入数据版本控制(DVC),MLFLIF和PACHYDERM等软件工程工具,使版本控制ML数据,模型,管道和模型评估指标。由于这些版本化的ML工件不仅需要彼此同步,而且还要与模型集成到模型的软件应用程序的源和测试代码,因此可能需要重新审视软件工件之间的共同演进和耦合的先前发现。因此,为了了解ML相关和其他软件工件之间的耦合程度,以及使用ML版本化特征,本文凭经验研究了DVC在391 GitHub项目中的使用,其中详细研究了25个。我们的结果表明,项目中的每一十分之一的项目中,项目中的超过一半DVC文件都已更改。此外,我们观察到DVC文件和其他工件之间的紧密耦合,其中1/4拉请求改变源代码和1/2拉请求改变需要改变DVC文件的测试。作为观察到与DVC等ML相关的软件工程工具相关的观察到复杂性的额外证据,平均78%的研究项目显示了管道复杂性的非恒定趋势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号