Serving Very Large Numbers of Low Latency AutoML Models

机译：提供非常大量的低延迟自动机型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Summary form only given, as follows. The complete presentation was not made available for publication as part of the conference proceedings. ML Serving infrastructure is becoming ubiquitous in the emerging ML industry and public cloud offerings. Existing solutions overwhelmingly rely on serving models as containers where one container hosts a single model with all its required dependencies. Salesforce and Einstein Platform have a unique multi-tenancy approach that heavily relies on AutoML and enforces automated feature engineering, training, and serving separate models per tenant. This approach helps us to scale to serve the hundreds of thousands of models. Based on the applications, and type/size of customer data, the model sizes, initialization time, and popularity/volume can vary widely, introducing the model balancing problem. We present our approach to scaling to a large number of models using multi-level routing and load balancing, sharing hundreds of models within each container, and utilizing sophisticated metric-driven mechanisms for model initialization, warmup, and model balancing. We’ll also present our solution for managing model versions and dependencies in shared container scenarios, and finally, lessons learned on our journey in this nascent space.

机译：摘要只给出，如下所述。完整的陈述未作为会议诉讼程序的一部分提供出版物。 ML服务基础设施在新兴ML行业和公共云产品中成为普遍存在。现有的解决方案绝大多大地依赖于服务模型作为一个容器，其中一个容器托管单个模型，其中包含其所有必需的依赖项。 Salesforce和Einstein平台具有独特的多租赁方法，依赖于自动机，并强制执行每租户的自动化功能，培训和服务单独的型号。这种方法有助于我们扩展为数以万计的型号。基于应用程序，以及客户数据的类型/大小，模型大小，初始化时间和流行度/卷可以很大广泛，引入模型平衡问题。我们介绍了使用多级路由和负载平衡的大量模型的方法，在每个容器中共享数百个模型，并利用复杂的度量标准驱动机制进行模型初始化，预热和模型平衡。我们还将在共享容器方案中管理模型版本和依赖关系的解决方案，最后，在这次新闻空间中的旅程中汲取了经验教训。

著录项

来源
《IEEE Infrastructure Conference》|2020年|i-i|共1页
会议地点
作者
Manoj Agarwal;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
infrastructure; machine learning; model serving;

机译：基础设施;机器学习;模型服务;

相似文献

外文文献
中文文献
专利

1. Demand for low-latency, high-bandwidth connectivity fuels RCN Metro's dominance in serving financial services industry [J] . Paul Polishuk Submarine Fiber Optic Communications Systems . 2010,第6期

机译：对低延迟，高带宽连接的需求推动了RCN Metro在服务金融服务行业方面的主导地位
2. UFOD: An AutoML framework for the construction, comparison, and combination of object detection models [J] . Garcia-Dominguez Manuel, Dominguez Cesar, Heras Jonathan, Pattern recognition letters . 2021,第May期

机译：UFOD：对象检测模型的施工，比较和组合的自动框架
3. Modeling of individual customer delivery satisfaction: an AutoML and multi-agent system approach [J] . Wang W. M., Wang J. W., Barenji A. V., Industrial management & data systems . 2019,第4期

机译：单个客户交付满意度的建模：AutoML和多代理系统方法
4. Efficient mobile fronthaul serving massive MIMO new radio services using single-IF with sample-wise TDM for reduced RRH complexity and ultra-low latency [C] . Feng Lu, Mu Xu, Lin Cheng, Optical Fiber Communications Conference and Exhibition . 2017

机译：高效的移动前传服务，通过单IF和基于样本的TDM为大规模MIMO新无线电服务提供服务，以降低RRH复杂度和超低延迟
5. The Design and Implementation of Low-latency Prediction Serving Systems [D] . Crankshaw, Daniel . 2019

机译：低延迟预测服务系统的设计与实现
6. BayesFlow: latent modeling of flow cytometry cell populations [O] . Kerstin Johnsson, Jonas Wallin, Magnus Fontes 2016

机译：BayesFlow：流式细胞仪细胞群体的潜在建模
7. The Missing Piece in Complex Analytics: Low Latency, Scalable Model Management and Serving with Velox [O] . Crankshaw, Daniel, Bailis, Peter, Gonzalez, Joseph E., 2014

机译：复杂分析中的缺失部分：低延迟，可扩展模型管理和服务Velox

Serving Very Large Numbers of Low Latency AutoML Models

摘要

著录项

相似文献

相关主题

期刊订阅