Declarative Parameterizations of User-Defined Functions for Large-Scale Machine Learning and Optimization

Gao Zekai J.; Pansare Niketan; Jermaine Christopher

首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Declarative Parameterizations of User-Defined Functions for Large-Scale Machine Learning and Optimization

【24h】

Declarative Parameterizations of User-Defined Functions for Large-Scale Machine Learning and Optimization

机译：大规模机器学习和优化的用户定义函数的声明性参数化

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Large-scale optimization has become an important application for data management systems, particularly in the context of statistical machine learning. In this paper, we consider how one might implement the join-and-co-group pattern in the context of a fully declarative data processing system. The join-and-co-group pattern is ubiquitous in iterative, large-scale optimization. In the join-and-co-group pattern, a user-defined function g is parameterized with a data object x as well as the subset of the statistical model Theta(x) that applies to that object, so that g(x vertical bar Theta(x)) can be used to compute a partial update of the model. This is repeated for every x in the full data set X. All partial updates are then aggregated and used to perform a complete update of the model. The join-and-co-group pattern has several implementation challenges, including the potential for a massive blow-up in the size of a fully parameterized model. Thus, unless the correct physical execution plan be chosen for implementing the join-and-co-group pattern, it is easily possible to have an execution that takes a very long time or even fails to complete. In this paper, we carefully consider the alternatives for implementing the join-and-co-group pattern on top of a declarative system, as well as how the best alternative can be selected automatically. Our focus is on the SimSQL database system, which is an SQL-based system with special facilities for large-scale, iterative optimization. Since it is an SQL-based system with a query optimizer, those choices can be made automatically.

机译：大规模优化已成为数据管理系统的重要应用程序，特别是在统计机器学习的情况下。在本文中，我们考虑了如何在完全声明性的数据处理系统的背景下实现联接和共同组模式。联合组模式在迭代，大规模优化中无处不在。在联接和共同组模式中，使用数据对象x以及适用于该对象的统计模型Theta（x）的子集对用户定义的函数g进行参数化，因此g（x竖线Theta（x））可用于计算模型的部分更新。对完整数据集X中的每个x重复此操作。然后汇总所有部分更新并用于执行模型的完整更新。加入并联合模式有一些实施方面的挑战，包括完全参数化模型的规模可能会急剧膨胀。因此，除非选择正确的物理执行计划来实施加入并合作组模式，否则很可能会花费很长时间甚至无法完成执行。在本文中，我们仔细考虑了在声明式系统之上实施联接和共同组模式的替代方法，以及如何自动选择最佳替代方法。我们的重点是SimSQL数据库系统，这是一个基于SQL的系统，具有用于大规模迭代优化的特殊功能。由于它是带有查询优化器的基于SQL的系统，因此可以自动进行那些选择。

著录项

来源
《IEEE Transactions on Knowledge and Data Engineering》 |2019年第11期|2079-2092|共14页
作者
Gao Zekai J.; Pansare Niketan; Jermaine Christopher;
展开▼
作者单位

Rice Univ Houston TX 77251 USA;

IBM Res Almaden San Jose CA 95120 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Large-scale machine learning; user-defined functions; declarative systems; join-and-co-group;

机译：大规模机器学习;用户定义的功能;声明式系统;加入并合作;

相似文献

外文文献
中文文献
专利

1. Compressed Linear Algebra for Declarative Large-Scale Machine Learning [J] . Elgohary Ahmed, Boehm Matthias, Haas Peter J., Communications of the ACM . 2019,第5期

机译：用于声明式大规模机器学习的压缩线性代数
2. Compressed Linear Algebra for Declarative Large-Scale Machine Learning [J] . Elgohary Ahmed, Boehm Matthias, Haas Peter J., Communications of the ACM . 2019,第5期

机译：用于陈述大型机器学习的压缩线性代数
3. A Scalable Molecular Force Field Parameterization Method Based on Density Functional Theory and Quantum-Level Machine Learning [J] . Galvelis Raimondas, Doerr Stefan, Damas Joao M., Journal of chemical information and modeling . 2019,第8期

机译：基于密度泛函理论和量子级机器学习的可伸缩分子力场参数化方法
4. Learning the Complete-Basis-Functions Parameterization for the Optimization of Dynamic Molecular Alignment by ES [C] . Ofer M. Shir, Joost N. Kok, Thomas Baeck, Intelligent Data Engineering and Automated Learing(IDEAL 2006); Lecture Notes in Computer Science; 4224 . 2006

机译：学习完整基函数参数化以通过ES优化动态分子比对
5. Efficient and Scalable Optimization Methods for Training Large-Scale Machine Learning Models [D] . Jahani, Majid. 2021

机译：高效且可扩展的培训大型机器学习模型的优化方法
6. Functional Network Construction in Arabidopsis Using Rule-Based Machine Learning on Large-Scale Data Sets [O] . George W. Bassel, Enrico Glaab, Julietta Marquez, 2011

机译：大规模数据集上基于规则的机器学习在拟南芥中构建功能网络
7. Parameterization of the collision-coalescence process using series of basis functions: COLNETv1.0.0 model development using a machine learning approach [O] . Camilo Fernando Rodríguez-Genó, Léster Alfonso 2021

机译：采用系列基础函数的碰撞结合过程的参数化：使用机器学习方法Colnetv1.0.0模型开发

Declarative Parameterizations of User-Defined Functions for Large-Scale Machine Learning and Optimization

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅