...
首页> 外文期刊>SIAM Journal on Scientific Computing >COMPUTING THE GRADIENT IN OPTIMIZATION ALGORITHMS FOR THE CP DECOMPOSITION IN CONSTANT MEMORY THROUGH TENSOR BLOCKING
【24h】

COMPUTING THE GRADIENT IN OPTIMIZATION ALGORITHMS FOR THE CP DECOMPOSITION IN CONSTANT MEMORY THROUGH TENSOR BLOCKING

机译:通过张量阻塞计算恒定内存中CP分解的优化算法的梯度

获取原文
获取原文并翻译 | 示例
           

摘要

The construction of the gradient of the objective function in gradient-based optimization algorithms for computing an r-term CANDECOMP/PARAFAC (CP) decomposition of an unstructured dense tensor is a key computational kernel. The best technique for efficiently implementing this operation has a memory consumption that scales linearly with the number of terms r and sublinearly with the number of elements of the tensor. We consider a blockwise computation of the CP gradient, reducing the memory requirements to a constant. This reduction is achieved by a novel technique that we call implicit block unfoldings, which combines the benefits of the block tensor unfoldings by [Ragnarsson and Van Loan, SIAM J. Matrix Anal. Appl., 33 (2012), pp. 149169] and the implicit unfoldings of [Phan, Tichavsky, and Cichocki, IEEE Trans. Signal Process., 61 (2013), pp. 4834-4846]. A heuristic algorithm for automatically choosing the division into subtensors is part of the proposed algorithm. The throughput that can be attained is essentially determined by the performance of a matrix product of two small matrices of constant size. Numerical experiments illustrate that the proposed method can outperform the current state-of-the-art by up to two orders of magnitude for large dense tensors in terms of memory consumption, while the increase of the execution time is no more than 5%. The proposed algorithm attained upward of 90% of the theoretical peak performance of the computer system, using no more than 50MB of memory, irrespective of the size of the tensor and the number of terms r.
机译:在基于梯度的优化算法中用于计算非结构化密集张量的r项CANDECOMP / PARAFAC(CP)分解的目标函数梯度的构造是关键的计算内核。有效执行此操作的最佳技术的内存消耗与项r的数量成线性比例,与张量的元素数成线性关系。我们考虑对CP梯度进行逐块计算,从而将内存需求减少到一个常数。这种减少是通过一种称为隐式块展开的新技术实现的,该技术结合了[Ragnarsson and Van Loan,SIAM J. Matrix Anal。]的块张量展开的优点。 Appl。,33(2012),pp.149169]和[Phan,Tichavsky和Cichocki的隐式展开,IEEE Trans。信号处理”,第61卷,2013年,第4834-4846页。用于自动选择划分为张量的启发式算法是该算法的一部分。基本上可以通过两个恒定大小的小矩阵的矩阵乘积的性能来确定可以达到的吞吐量。数值实验表明,对于大的密集张量,该方法在内存消耗方面可以比当前的最新技术高两个数量级,而执行时间的增加不超过5%。所提出的算法使用了不超过50MB的内存,而不管张量的大小和项r的数量,都达到了计算机系统理论峰值性能的90%以上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号