首页> 外文学位 >Graph Based Models for Unsupervised High Dimensional Data Clustering and Network Analysis.
【24h】

Graph Based Models for Unsupervised High Dimensional Data Clustering and Network Analysis.

机译:基于图的模型用于无监督的高维数据聚类和网络分析。

获取原文
获取原文并翻译 | 示例

摘要

The demand for analyzing patterns and structures of data is growing dramatically in recent years. The study of network structure is pervasive in sociology, biology, computer science, and many other disciplines. My research focuses on network and high-dimensional data analysis, using graph based models and tools from sparse optimization. The specific question about networks we are studying is "clustering": partitioning a network into cohesive groups. Depending on the contexts, these tightly connected groups can be social units, functional modules, or components of an image.;My work consists of both theoretical analysis and numerical simulation. We first analyze some social network and image datasets using a quality function called "modularity", which is a popular model for clustering in network science. Then we further study the modularity function from a novel perspective: with my collaborators we reformulate modularity optimization as a minimization problem of an energy functional that consists of a total variation term and an L2 balance term. By employing numerical techniques from image processing and L1 compressive sensing, such as the Merriman-Bence-Osher (MBO) scheme, we develop a variational algorithm for the minimization problem.;Along a similar line of research, we work on a multi-class segmentation problem using the piecewise constant Mumford-Shah model in a graph setting. We propose an efficient algorithm for the graph version of Mumford-Shah model using the MBO scheme. Theoretical analysis is developed and a Lyapunov functional is proven to decrease as the algorithm proceeds. Furthermore, to reduce the computational cost for large datasets, we incorporate the Nystrom extension method to efficiently approximates eigenvectors of the graph Laplacian based on a small portion of the weight matrix. Finally, we implement the proposed method on the problem of chemical plume detection in hyper-spectral video data. These graph based clustering algorithms we proposed improve the time efficiency significantly for large scale datasets. In the last chapter, we also propose an incremental reseeding strategy for clustering, which is an easy-to-implement and highly parallelizable algorithm for multiway graph partitioning. We demonstrate experimentally that this algorithm achieves state-of-the-art performance in terms of cluster purity on standard benchmark datasets. Moreover, the algorithm runs an order of magnitude faster than the other algorithms.
机译:近年来,对数据的模式和结构进行分析的需求急剧增长。网络结构的研究遍及社会学,生物学,计算机科学和许多其他学科。我的研究专注于网络和高维数据分析,使用基于图的模型和稀疏优化工具。关于我们正在研究的网络的具体问题是“集群”:将网络划分为多个凝聚的群体。这些依赖于上下文的紧密联系的群体可以是社会单位,功能模块或图像的组成部分。;我的工作包括理论分析和数值模拟。我们首先使用称为“模块化”的质量函数分析一些社交网络和图像数据集,这是网络科学中流行的聚类模型。然后,我们从一个新颖的角度进一步研究模块化函数:与我的合作者一起,我们将模块化优化重新设计为一个能量函数的最小化问题,该函数由一个总变化项和一个L2平衡项组成。通过运用来自图像处理和L1压缩感测的数值技术(例如Merriman-Bence-Osher(MBO)方案),我们开发了一种最小化问题的变分算法。沿着类似的研究领域,我们从事多类研究在图形设置中使用分段常数Mumford-Shah模型进行细分问题。我们使用MBO方案为Mumford-Shah模型的图形版本提出了一种有效的算法。进行了理论分析,并证明了随着算法的进行,Lyapunov函数会减少。此外,为了减少大型数据集的计算成本,我们结合了Nystrom扩展方法,以基于权重矩阵的一小部分有效地逼近图拉普拉斯算子的特征向量。最后,我们针对高光谱视频数据中的化学羽流检测问题实施了所提出的方法。我们提出的这些基于图的聚类算法显着提高了大型数据集的时间效率。在上一章中,我们还提出了一种用于集群的增量补种策略,该策略是一种易于实现且高度可并行化的多向图分区算法。我们通过实验证明了该算法在标准基准数据集上就簇纯度而言实现了最先进的性能。此外,该算法的运行速度比其他算法快一个数量级。

著录项

  • 作者

    Hu, Huiyi.;

  • 作者单位

    University of California, Los Angeles.;

  • 授予单位 University of California, Los Angeles.;
  • 学科 Applied mathematics.
  • 学位 Ph.D.
  • 年度 2015
  • 页码 129 p.
  • 总页数 129
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号