...
首页> 外文期刊>ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages >Automatic Locality-Friendly Interface Extension of Numerical Functions
【24h】

Automatic Locality-Friendly Interface Extension of Numerical Functions

机译:数值函数的自动局部友好界面扩展

获取原文
获取原文并翻译 | 示例
           

摘要

Raising the level of abstraction is a key concern of software engineering, and libraries (either used directly or as a target of a program generation system) are a successful technique to raise programmer productivity and to improve software quality. Unfortunately successful libraries may contain functions that may not be general enough. For example, many numeric performance libraries contain functions that work on one-or higher-dimensional arrays. A problem arises if a program wants to invoke such a function on a non-contiguous subarray (e.g., in C the column of a matrix or a subarray of an image). If the library developer did not foresee this scenario, the client program must include explicit copy steps before and after the library function call, incurring a possibly high performance penalty. A better solution would be an enhanced library function that allows for the desired access pattern. Exposing the access pattern allows the compiler to optimize for the intended usage scenario(s). As we do not want the library developer to generate all interesting versions manually, we present a tool that takes a library function written in C and generates such a customized function for typical accesses. We describe the approach, discuss limitations, and report on the performance. As example access patterns we consider those most common in numerical applications: striding and block striding, general permutations, as well as scaling. We evaluate the tool on various library functions including filters, scans, reductions, sorting, FFTs, and linear algebra operations. The automatically generated custom version is in most cases significantly faster than using individual steps, offering speed-ups that are typically in the range of 1.2-1.8x.
机译:提高抽象级别是软件工程的关键问题,而库(直接使用或作为程序生成系统的目标使用)是提高程序员生产率和提高软件质量的成功技术。不幸的是,成功的库可能包含的功能可能不够通用。例如,许多数字性能库包含可在一维或更高维数组上运行的函数。如果程序要在非连续的子数组上调用此类函数(例如,在C中,矩阵的列或图像的子数组),则会出现问题。如果库开发人员未预见到这种情况,则客户端程序必须在库函数调用之前和之后包括显式的复制步骤,这可能会导致高性能下降。更好的解决方案是增强的库功能,该功能允许所需的访问模式。公开访问模式使编译器可以针对预期的使用场景进行优化。由于我们不希望库开发人员手动生成所有有趣的版本,因此我们提供了一个工具,该工具采用C语言编写的库函数并为典型访问生成了此类自定义函数。我们描述了这种方法,讨论了局限性,并报告了性能。作为示例访问模式,我们考虑在数值应用程序中最常见的访问模式:跨步和块跨步,常规置换以及缩放。我们对工具的各种库函数进行了评估,包括过滤器,扫描,归约,排序,FFT和线性代数运算。在大多数情况下,自动生成的自定义版本比使用单个步骤要快得多,提供的加速范围通常在1.2-1.8倍之间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号