Ease.ml/ci and Ease.ml/meter in Action: Towards Data Management for Statistical Generalization

机译：Ease.ml/ci和Ease.ml/meter的应用：实现数据管理以进行统计概括

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Developing machine learning (ML) applications is similar to developing traditional software - it is often an iterative process in which developers navigate within a rich space of requirements, design decisions, implementations, empirical quality, and performance. In traditional software development, software engineering is the field of study which provides principled guidelines for this iterative process. However, as of today, the counterpart of "software engineering for ML" is largely missing - developers of ML applications are left with powerful tools (e.g., TensorFlow and PyTorch) but little guidance regarding the development lifecycle itself. In this paper, we view the management of ML development life-cycles from a data management perspective. We demonstrate two closely related systems, ease.ml/ci and ease.ml/meter, that provide some "principled guidelines" for ML application development: ci is a continuous integration engine for ML models and meter is a "profiler" for controlling overfitting of ML models. Both systems focus on managing the "statistical generalization power" of datasets used for assessing the quality of ML applications, namely, the validation set and the test set. By demonstrating these two systems we hope to spawn further discussions within our community on building this new type of data management systems for statistical generalization.

机译：开发机器学习（ML）应用程序类似于开发传统软件-它通常是一个迭代过程，在此过程中，开发人员可以在需求，设计决策，实现，经验质量和性能的丰富空间内进行导航。在传统的软件开发中，软件工程是研究领域，它为该迭代过程提供了原则性的指导。但是，到目前为止，与ML的软件工程相对应的东西已大为缺失-ML应用程序的开发人员只剩下了功能强大的工具（例如TensorFlow和PyTorch），但是关于开发生命周期本身的指导却很少。在本文中，我们从数据管理的角度查看了ML开发生命周期的管理。我们演示了两个紧密相关的系统，即easy.ml/ci和easy.ml/meter，它们为ML应用程序开发提供了一些“原则”：ci是用于ML模型的持续集成引擎，而meter是用于控制过度拟合的“分析器” ML模型。这两个系统都专注于管理用于评估ML应用程序质量的数据集的“统计泛化能力”，即验证集和测试集。通过演示这两个系统，我们希望在我们的社区中引发更多讨论，以构建用于统计归纳的新型数据管理系统。

著录项

来源
《International conference on very large data bases》|2019年|1962-1965|共4页
会议地点
作者
Cedric Renggli; Frances Ann Hubis; Bojan Karlas; Kevin Schawinski; Wentao Wu; Ce Zhang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. TREVOR HASTIE, ROBERT TIBSHIRANI, AND MARTIN WAINWRIGHT . Statistical Learning with Sparsity: The Lasso and Generalizations . Boca Raton : CRC Press . TREVOR HASTIE, ROBERT TIBSHIRANI, AND MARTIN WAINWRIGHT TREVOR HASTIE, ROBERT TIBSHIRANI, AND MARTIN WAINWRIGHT . Statistical Learning with Sparsity: The Lasso and Generalizations Statistical Learning with Sparsity: The Lasso and Generalizations . Boca Raton Boca Raton : CRC Press CRC Press . [J] . Kondofersky Ivan, Theis Fabian J. Biometrics: Journal of the Biometric Society : An International Society Devoted to the Mathematical and Statistical Aspects of Biology . 2018,第2期

机译：Trevor Hastie，Robert Tibshirani和Martin Wainwright。统计学习稀疏性：套索和概括。 Boca Raton：CRC压力机。 Trevor Hastie，Robert Tibshirani和Martin Wainwright Trevor Hastie，Robert Tibshirani和Martin Wainwright。统计学习与稀疏性：套索与稀疏性统计学习：套索和概括。 Boca Raton Boca Raton：CRC按CRC压力机。
2. TREVOR HASTIE, ROBERT TIBSHIRANI, AND MARTIN WAINWRIGHT . Statistical Learning with Sparsity: The Lasso and Generalizations . Boca Raton : CRC Press . TREVOR HASTIE, ROBERT TIBSHIRANI, AND MARTIN WAINWRIGHT TREVOR HASTIE, ROBERT TIBSHIRANI, AND MARTIN WAINWRIGHT . Statistical Learning with Sparsity: The Lasso and Generalizations Statistical Learning with Sparsity: The Lasso and Generalizations . Boca Raton Boca Raton : CRC Press CRC Press . [J] . Kondofersky Ivan, Theis Fabian J. Biometrics: Journal of the Biometric Society : An International Society Devoted to the Mathematical and Statistical Aspects of Biology . 2018,第2期

机译：Trevor Hastie，Robert Tibshirani和Martin Wainwright。统计学习与稀疏性：套索和概括。 Boca Raton：CRC压力机。 Trevor Hastie，Robert Tibshirani和Martin Wainwright Trevor Hastie，Robert Tibshirani和Martin Wainwright。统计学习与稀疏性：套索与稀疏性统计学习：套索和概括。 Boca Raton Boca Raton：CRC按CRC压力机。
3. The Role of Digital Trace Data in Supporting the Collection of Population Statistics - the Case for Smart Metered Electricity Consumption Data (vol 22, pg 849, 2016) [J] . Newing Andy, Anderson Ben, Bahaj AbuBakr, Population, Space and Place . 2017,第8期

机译：数字跟踪数据在支持人口统计数据收集中的作用-智能电表的用电量数据（第22卷，第849页，2016年）
4. Ease.ml/ci and Ease.ml/meter in Action: Towards Data Management for Statistical Generalization [C] . Cedric Renggli, Frances Ann Hubis, Bojan Karlas, International conference on very large data bases . 2019

机译：EASE.ML/CI和EASE.ML/MLER/MLER/MLER/ML/MLES：朝着统计概括的数据管理
5. A systematization of statistical data management/manipulation tasks and a comparison of statistical data management/manipulation capabilities of SAS and SPSS base programs [D] . Fan, Yihua. 1993

机译：统计数据管理/处理任务的系统化以及SAS和SPSS基本程序的统计数据管理/处理功能的比较
6. Abstraction and generalization in statistical learning: implications for the relationship between semantic types and episodic tokens [O] . Gerry T. M. Altmann 2017

机译：统计学习中的抽象和泛化：对语义类型和情节性记号之间关系的启示
7. Using Energy Metering Data to Support Official Statistics: A Feasibility Study Final Report to the Office for National Statistics [O] . Anderson B, Newing A 2015

机译：使用能量计量数据支持官方统计：可行性研究最终报告给国家统计局

Ease.ml/ci and Ease.ml/meter in Action: Towards Data Management for Statistical Generalization

摘要

著录项

相似文献

相关主题

期刊订阅