INFER: INterFerence-aware Estimation of Runtime for Concurrent CNN Execution on DPUs

机译：推断：DPU上并发CNN执行运行时的干扰感知估计

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Deep Learning Processor Unit (DPU) from XILINX is among the numerous accelerators that have been proposed to speed up the execution of Convolutional Neural Networks (CNNs) on embedded platforms. DPUs are available in different configurable sizes and can execute any given CNN. Neural network researchers are also rapidly bringing out newer CNN algorithms with improved performance (typically higher prediction accuracy) with a trade-off in size or energy consumption for embedded applications. To enable quick evaluation of choices among evolving CNN algorithms and accelerator configurations, we propose INFER (INterFerence-aware Estimation of Runtime). INFER is a framework to estimate the execution time of any CNN on a given size of DPU without actual implementation. Further, current FPGA platforms are capable of implementing multiple DPUs whereas many applications consist of multiple sub-tasks with each requiring separate and/or different CNNs. In such scenarios of concurrent use of multiple DPUs on an FPGA, INFER is also capable of estimating the additional time taken for execution due to the sharing of memory bandwidth. Our evaluation on various mixes of 16 standard CNNs and eight configurations of DPU shows that INFER has an average prediction error of 6.6%, which can be useful for design space exploration as well as scheduling in multi-DPU platforms.

机译：来自Xilinx的深度学习处理器单元（DPU）是众多加速器，已经提出加快嵌入式平台上的卷积神经网络（CNNS）的执行。 DPU以不同的可配置尺寸提供，可以执行任何给定的CNN。神经网络研究人员还迅速推出了具有改进的性能（通常更高的预测精度）的新型CNN算法，其嵌入式应用的尺寸或能耗的折衷。为了在不断变化的CNN算法和加速器配置中可以快速评估选择，我们提出推断（运行时的干扰感知估计）。推断是一个框架，用于在没有实际实现的情况下估计在给定大小的DPU大小的任何CNN的执行时间。此外，当前的FPGA平台能够实现多个DPU，而许多应用程序由多个子任务组成，其中每个需要单独和/或不同的CNN。在FPGA上同时使用多个DPU的这种情况下，推断也能够估计由于内存带宽的共享而执行的额外时间。我们对16个标准CNN的各种混合物的评估和DPU的8个配置显示推断的平均预测误差为6.6％，这对于设计空间探索以及在多DPU平台中的调度非常有用。

著录项

来源
《International Conference on Field-Programmable Technology》|2020年|66-71|共6页
会议地点
作者
Shikha Goel; Rajesh Kedia; M. Balakrishnan; Rijurekha Sen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Deep learning; Energy consumption; Runtime; Neural networks; Estimation; Bandwidth; Prediction algorithms;

机译：深入学习;能量消耗;运行时;神经网络;估计;带宽;预测算法;

相似文献

外文文献
中文文献
专利

1. Rusty: Runtime Interference-Aware Predictive Monitoring for Modern Multi-Tenant Systems [J] . Masouros Dimosthenis, Xydis Sotirios, Soudris Dimitrios IEEE Transactions on Parallel and Distributed Systems . 2021,第1期

机译：生锈：现代多租户系统的运行时干扰感知预测监控
2. CONCURRENT AFFAIRS: CONCURRENCY AND COORDINATION RUNTIME [J] . JEFFREY RICHTER MSDN Magazine . 2006,第10期

机译：同步事务：同步和协调运行时
3. Replacing conjectures by positive knowledge: Inferring proven precise worst-case execution time bounds using symbolic execution [J] . Knoop Jens, Kovacs Laura, Zwirchmayr Jakob Journal of symbolic computation . 2017,第pta1期

机译：用可靠的知识代替猜想：使用符号执行推断出经过验证的精确的最坏情况执行时间范围
4. Exploiting Interference-aware GPU Container Concurrency Learning from Resource Usage of Application Execution [C] . Seiin Kim, Yoonhee Kim Asia-Pacific Network Operations and Management Symposium . 2020

机译：从应用程序执行的资源使用中开发可识别干扰的GPU容器并发学习
5. Integrating concurrency control and proxy execution support and provide a framework for deterministic concurrency testing under the KURT-Linux group scheduling model. [D] . Aswathanarayana, Tejasvi. 2006

机译：集成了并发控制和代理执行支持，并为KURT-Linux组调度模型下的确定性并发测试提供了框架。
6. Spatiotemporal Tuning of the Facilitation of Biological Motion Perception by Concurrent Motor Execution [O] . Andrea Christensen, Winfried Ilg, Martin A. Giese 2011

机译：并发运动执行促进生物运动知觉的时空调整
7. Verifying Concurrent Systems with Symbolic Execution : Temporal Reasoning is Symbolic Execution with a Little Induction [O] . Balser Michael 2006

机译：使用符号执行验证并发系统：时间推理是符号执行的一点点归纳

INFER: INterFerence-aware Estimation of Runtime for Concurrent CNN Execution on DPUs

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅