首页> 外文OA文献 >Algorithms and Models for Tensors and Networks with Applications in Data Science
【2h】

Algorithms and Models for Tensors and Networks with Applications in Data Science

机译:张量和网络的算法和模型及其在数据科学中的应用

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Big data plays an increasingly central role in many areas of research including optimization and network modeling. We consider problems applicable to large datasets within these two branches of research. We begin by presenting a nonlinearly preconditioned nonlinear conjugate gradient (PNCG) algorithm to increase the convergence speed of iterative unconstrained optimization methods. We provide a concise overview of several PNCG variants and their properties and obtain a new convergence result for one of the PNCG variants under suitable conditions. We then use the PNCG algorithm to solve two different problems: computing the rank-R canonical tensor decomposition and finding the solution to a latent factor model where latent factor models are often used as important building blocks in many practical recommendation systems. For both problems, the alternating least squares (ALS) algorithm is typically used to find a solution and as such we consider it as a nonlinear preconditioner. Note that the ALS algorithm can be viewed as a nonlinear preconditioner for the NCG algorithm or alternatively, NCG can be viewed as an acceleration process for ALS. We demonstrate numerically that the convergence acceleration mechanism in PNCG often leads to important pay-offs for difficult tensor decomposition problems, with convergence that is significantly faster and more robust than for the stand-alone NCG or ALS algorithms. As well, we show numerically that the PNCG algorithm requires many fewer iterations and less time to reach desired ranking accuracies than stand-alone ALS in solving latent factor models. We next turn to problems within the field of network or graph modeling. A network is a collection of points joined together by lines and networks are used in a broad variety of fields to represent connections between objects. Many large real-world networks share similar properties which has garnered considerable interest in developing models that can replicate these properties. We begin our discussion of graph models by closely examining the Chung-Lu model. The Chung-Lu model is a very simple model where by design the expected degree sequence of a graph generated by the model is equal to a user-supplied degree sequence. We explore what happens both theoretically and numerically when simple changes are made to the model and when the model assumptions are violated. As well, we consider an algorithm used to generate instances of the Chung-Lu model that is designed to be faster than the traditional algorithm but find that it only generates instances of an approximate Chung-Lu model. We explore the properties of this approximate model under a variety of conditions and examine how different the expected degree sequence is from the user-supplied degree sequence. We also explore several ways of improving this approximate model to reduce the approximation error in the expected degree sequence and note that when the assumptions of the original model are violated this error remains very large. We next design a new graph generator to match the community structure found in real-world networks as measured using the clustering coefficient and assortativity coefficient. Our graph generator uses information generated from a clustering algorithm run on the original network to build a synthetic network. Using several real-world networks, we test our algorithm numerically by creating a synthetic network and then comparing the properties to the real network properties as well as to the properties of another popular graph generator, BTER, developed by Seshadhri, Kolda and Pinar. Our graph generator does well at preserving the clustering coefficient and typically outperforms BTER in matching the assortativity coefficient, particularly when the assortativity coefficient is negative.
机译:大数据在包括优化和网络建模在内的许多研究领域中发挥着越来越重要的作用。我们考虑在这两个研究分支中适用于大型数据集的问题。我们首先提出一种非线性预处理的非线性共轭梯度(PNCG)算法,以提高迭代无约束优化方法的收敛速度。我们提供了几种PNCG变体及其特性的简要概述,并在合适的条件下获得了一个PNCG变体的新收敛结果。然后,我们使用PNCG算法来解决两个不同的问题:计算rank-R规范张量分解和找到潜在因子模型的解决方案,其中潜在因子模型在许多实际推荐系统中经常被用作重要的构建基块。对于这两个问题,通常使用交替最小二乘(ALS)算法来查找解决方案,因此我们将其视为非线性前置条件。注意,可以将ALS算法视为NCG算法的非线性预处理器,或者将NCG视为ALS的加速过程。我们用数值方法证明了PNCG中的收敛加速机制通常会导致棘手的张量分解问题的重要收益,其收敛比独立的NCG或ALS算法快得多,并且更鲁棒。同样,我们通过数字显示,在求解潜在因子模型时,与独立ALS相比,PNCG算法所需的迭代次数更少,所需的时间也更少。接下来,我们转向网络或图形建模领域的问题。网络是由线连接在一起的点的集合,网络在各种各样的领域中用于表示对象之间的连接。许多大型的现实世界网络共享相似的属性,因此在开发可以复制这些属性的模型时引起了极大的兴趣。我们通过仔细研究Chung-Lu模型开始对图模型的讨论。 Chung-Lu模型是一个非常简单的模型,其中通过设计,该模型生成的图形的预期程度序列等于用户提供的程度序列。我们将探索对模型进行简单的更改以及违反模型假设时在理论上和数字上都会发生的情况。同样,我们考虑了一种用于生成Chung-Lu模型实例的算法,该算法的设计比传统算法要快,但发现它仅生成近似Chung-Lu模型的实例。我们探索了这种近似模型在各种条件下的性质,并研究了预期的度数序列与用户提供的度数序列有何不同。我们还探索了几种改进此近似模型的方法,以减少预期程度序列中的近似误差,并注意,当违反原始模型的假设时,该误差仍然很大。接下来,我们设计一个新的图生成器,以匹配使用聚类系数和分类系数测量的在现实世界网络中发现的社区结构。我们的图生成器使用从在原始网络上运行的聚类算法生成的信息来构建综合网络。我们使用几个真实世界的网络,通过创建一个合成网络,然后将该属性与实际网络属性以及由Seshadhri,Kolda和Pinar开发的另一种流行图形生成器BTER的属性进行比较,对算法进行数值测试。我们的图生成器在保留聚类系数方面做得很好,并且在匹配分类系数时通常胜过BTER,尤其是当分类系数为负时。

著录项

  • 作者

    Winlaw Manda;

  • 作者单位
  • 年度 2016
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号