【24h】

Energy-efficient Inference Service of Transformer-based Deep Learning Models on GPUs

机译:GPU上基于变压器的深度学习模型的节能推断服务

获取原文
获取外文期刊封面目录资料

摘要

Inference-as-a-service (IAAS) has been recently launched by cloud service providers to support on-demand AI applications. Many natural language processing (NLP) services are based on the Transformer Sequence Transduction model. However, the inference process of the Transformer model consumes a significant amount of energy due to the large model size (e.g., billions of parameters) and tremendous computations. How to reduce the energy consumption of IAAS without violating the service-level agreement (SLA) becomes a practical challenge for service providers. In this work, we conduct a comprehensive study on the inference performance and energy efficiency of a Transformer model trained for the language translation service. First, we empirically characterize some essential performance metrics, including latency, throughput, and energy consumption on three different GPUs with diversified workload configurations. The detailed workload separation facilitates a thorough and deep understanding of the inference process of the Transformer model. Second, we provide an energy consumption model for the Transformer based on the observed data. Finally, we propose the Aligned scheduling scheme that optimizes throughput and energy efficiency with up to 2.86× and 2.73× improvement at the cost of 40% average latency loss. Our findings provide a full scope of Transformer inference, and suggest that the workload balancing and scheduling have great potentials to offer energy-efficient Transformer inference services.
机译:最近由云服务提供商推出的推理 - AS-Service(IAAS),以支持按需AI应用程序。许多自然语言处理(NLP)服务基于变压器序列转换模型。然而,变压器模型的推理过程消耗的能量的量显著由于大模型尺寸(例如,数十亿参数)和巨大的计算。如何降低IAAS的能源消耗而不违反服务级别协议(SLA)成为服务提供商的实际挑战。在这项工作中,我们对用于语言翻译服务培训的变压器模型的推理性能和能效进行了全面的研究。首先,我们经验表征了一些基本的性能指标,包括三种不同GPU的延迟,吞吐量和能耗,具有多样化的工作负载配置。详细的工作负载分离有助于彻底和深刻的理解变压器模型的推理过程。其次,我们为基于观察到的数据提供了变压器的能耗模型。最后,我们提出了对齐的调度方案,可优化吞吐量和能效,其平均延迟损耗的成本高达2.86×和2.73倍。我们的调查结果提供了完整的变压器推论,并表明工作负载平衡和调度具有很大的潜力,可以提供节能变压器推理服务。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号