首页> 外文会议>IEEE Infrastructure Conference >Serving Very Large Numbers of Low Latency AutoML Models
【24h】

Serving Very Large Numbers of Low Latency AutoML Models

机译:提供非常大量的低延迟自动机型

获取原文

摘要

Summary form only given, as follows. The complete presentation was not made available for publication as part of the conference proceedings. ML Serving infrastructure is becoming ubiquitous in the emerging ML industry and public cloud offerings. Existing solutions overwhelmingly rely on serving models as containers where one container hosts a single model with all its required dependencies. Salesforce and Einstein Platform have a unique multi-tenancy approach that heavily relies on AutoML and enforces automated feature engineering, training, and serving separate models per tenant. This approach helps us to scale to serve the hundreds of thousands of models. Based on the applications, and type/size of customer data, the model sizes, initialization time, and popularity/volume can vary widely, introducing the model balancing problem. We present our approach to scaling to a large number of models using multi-level routing and load balancing, sharing hundreds of models within each container, and utilizing sophisticated metric-driven mechanisms for model initialization, warmup, and model balancing. We’ll also present our solution for managing model versions and dependencies in shared container scenarios, and finally, lessons learned on our journey in this nascent space.
机译:摘要只给出,如下所述。完整的陈述未作为会议诉讼程序的一部分提供出版物。 ML服务基础设施在新兴ML行业和公共云产品中成为普遍存在。现有的解决方案绝大多大地依赖于服务模型作为一个容器,其中一个容器托管单个模型,其中包含其所有必需的依赖项。 Salesforce和Einstein平台具有独特的多租赁方法,依赖于自动机,并强制执行每租户的自动化功能,培训和服务单独的型号。这种方法有助于我们扩展为数以万计的型号。基于应用程序,以及客户数据的类型/大小,模型大小,初始化时间和流行度/卷可以很大广泛,引入模型平衡问题。我们介绍了使用多级路由和负载平衡的大量模型的方法,在每个容器中共享数百个模型,并利用复杂的度量标准驱动机制进行模型初始化,预热和模型平衡。我们还将在共享容器方案中管理模型版本和依赖关系的解决方案,最后,在这次新闻空间中的旅程中汲取了经验教训。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号