首页> 外文会议>International conference on very large data bases >Ease.ml/ci and Ease.ml/meter in Action: Towards Data Management for Statistical Generalization
【24h】

Ease.ml/ci and Ease.ml/meter in Action: Towards Data Management for Statistical Generalization

机译:Ease.ml/ci和Ease.ml/meter的应用:实现数据管理以进行统计概括

获取原文

摘要

Developing machine learning (ML) applications is similar to developing traditional software - it is often an iterative process in which developers navigate within a rich space of requirements, design decisions, implementations, empirical quality, and performance. In traditional software development, software engineering is the field of study which provides principled guidelines for this iterative process. However, as of today, the counterpart of "software engineering for ML" is largely missing - developers of ML applications are left with powerful tools (e.g., TensorFlow and PyTorch) but little guidance regarding the development lifecycle itself. In this paper, we view the management of ML development life-cycles from a data management perspective. We demonstrate two closely related systems, ease.ml/ci and ease.ml/meter, that provide some "principled guidelines" for ML application development: ci is a continuous integration engine for ML models and meter is a "profiler" for controlling overfitting of ML models. Both systems focus on managing the "statistical generalization power" of datasets used for assessing the quality of ML applications, namely, the validation set and the test set. By demonstrating these two systems we hope to spawn further discussions within our community on building this new type of data management systems for statistical generalization.
机译:开发机器学习(ML)应用程序类似于开发传统软件-它通常是一个迭代过程,在此过程中,开发人员可以在需求,设计决策,实现,经验质量和性能的丰富空间内进行导航。在传统的软件开发中,软件工程是研究领域,它为该迭代过程提供了原则性的指导。但是,到目前为止,与ML的软件工程相对应的东西已大为缺失-ML应用程序的开发人员只剩下了功能强大的工具(例如TensorFlow和PyTorch),但是关于开发生命周期本身的指导却很少。在本文中,我们从数据管理的角度查看了ML开发生命周期的管理。我们演示了两个紧密相关的系统,即easy.ml/ci和easy.ml/meter,它们为ML应用程序开发提供了一些“原则”:ci是用于ML模型的持续集成引擎,而meter是用于控制过度拟合的“分析器” ML模型。这两个系统都专注于管理用于评估ML应用程序质量的数据集的“统计泛化能力”,即验证集和测试集。通过演示这两个系统,我们希望在我们的社区中引发更多讨论,以构建用于统计归纳的新型数据管理系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号