Incremental and Approximate Computations for Accelerating Deep CNN Inference

Nakandala Supun; Nagrecha Kabir; Kumar Arun; Papakonstantinou Yannis

摘要

Deep learning now offers state-of-the-art accuracy for many prediction tasks. A form of deep learning called deep convolutional neural networks (CNNs) are especially popular on image, video, and time series data. Due to its high computational cost, CNN inference is often a bottleneck in analytics tasks on such data. Thus, a lot of work in the computer architecture, systems, and compilers communities study how to make CNN inference faster. In this work, we show that by elevating the abstraction level and re-imagining CNN inference as queries, we can bring to bear database-style query optimization techniques to improve CNN inference efficiency. We focus on tasks that perform CNN inference repeatedly on inputs that are only slightly different. We identify two popular CNN tasks with this behavior: occlusion-based explanations (OBE) and object recognition In videos (ORV). OBE is a popular method for "explaining" CNN predictions. It outputs a heatmap over the input to show which regions (e.g., image pixels) mattered most for a given prediction. It leads to many re-inference requests on locally modified inputs. ORV uses CNNs to identify and track objects across video frames. It also leads to many re-inference requests. We cast such tasks in a unified manner as a novel instance of the incremental view maintenance problem and create a comprehensive algebraic framework for incremental CNN inference that reduces computational costs. We produce materialized views of features produced inside a CNN and connect them with a novel multi-query optimization scheme for CNN re-inference. Finally, we also devise novel OBE-specific and ORV-specific approximate inference optimizations exploiting their semantics. We prototype our ideas in Python to create a tool called KRYPTON that supports both CPUs and CPUs. Experiments with real data and CNNs show that KRYPTON reduces runtimes by up to 5x (respectively, 35x) to produce exact (respectively, high-quality approximate) results without raising resource requirements.

机译：深度学习现在为许多预测任务提供最先进的准确性。一种被称为深度卷积神经网络（CNNS）的深度学习形式在图像，视频和时间序列数据上特别流行。由于其高计算成本，CNN推理通常是在此类数据上的分析任务中的瓶颈。因此，在计算机架构，系统和编译器社区中的许多工作研究如何更快地制作CNN推理。在这项工作中，我们表明，通过提升抽象级别并重新想象CNN推断作为查询，我们可以带来具有数据库风格的查询优化技术，以提高CNN推理效率。我们专注于在仅略有不同的输入上反复执行CNN推断的任务。我们使用此行为识别两个流行的CNN任务：基于遮挡的解释（OBE）和视频中的对象识别（ORV）。 Obe是一种流行的“解释”CNN预测方法。它在输入上输出热图，以显示给定预测最差的区域（例如，图像像素）。它导致许多在本地修改输入上的重新推理请求。 ORV使用CNN来识别和跟踪视频帧的对象。它还导致许多重新推理请求。我们以统一的方式作为增量视图维护问题的小说实例，并为增量CNN推理创建了一个综合的代数框架，从而降低了计算成本的全面代数框架。我们生产在CNN内产生的功能的物化视图，并将其与新型多查询优化方案连接，用于CNN重新推断。最后，我们还设计了新颖的奥贝特和orv特定的近似推理优化，利用了他们的语义。我们在Python中原理我们的想法，以创建一个名为Krypton的工具，支持CPU和CPU。实验与实际数据和CNNS显示Krypton通过最多5倍（分别为35倍）减少了运行时，以产生精确（分别高质量的近似）结果，而无需提高资源要求。

Incremental and Approximate Computations for Accelerating Deep CNN Inference

摘要

著录项

引文网络

相关主题

期刊订阅