首页> 外文学位 >Computational Modeling of Compositional and Relational Data Using Optimal Transport and Probabilistic Models
【24h】

Computational Modeling of Compositional and Relational Data Using Optimal Transport and Probabilistic Models

机译:利用最佳传输和概率模型对组成和关系数据进行计算建模

获取原文
获取原文并翻译 | 示例

摘要

Quantitative researchers often view our world as a large collection of data generated and organized by the structures and functions of society and technology. Those data are usually presented and accessed with hierarchies, compositions, and relations. Understanding the structures and functions behind such data requires using models and methods for specifically analyzing their associated data structures. One of the biggest challenges in achieving this goal is developing a principled data and model framework capable of meaningfully exploiting the structured knowledge of data. Those structures of data include compositional and relational patterns: multiple entities have to interact and group in order to make sense. Although the conventional vector-based data analysis pipelines have become the standard quantitative framework for many fields in sciences and technology, they are not directly applicable to and have several limitations for extracting knowledge from compositional and relational data.;The goal of this thesis research is to introduce new mathematical models and computational methods for analyzing large-scale compositional and relational data, as well as to validate the models' usefulness in solving real-world problems. We begin by introducing several backgrounds, including optimal transport, an old but refreshing topic in mathematics, and probabilistic graphical model, a popular tool in statistical modeling. Particularly, we explain how optimal transport relates to an important modeling concept, a.k.a. matching, in machine learning. Next, we present our work related to computational algorithms of those relational and structural models including a fast discrete distribution clustering method using Wasserstein barycenters, a simulated annealing-based inexact oracle for Wasserstein loss minimization, a Bregman ADMM-based oracle for Wasserstein geodesic classification, and a probabilistic multi-graph model for consensus analysis. Their computational complexities, numerical difficulties, scalability, and accuracy issues are discussed in depth. We apply those computational algorithms to several areas, such as document analysis and crowdsourcing, by treating data as relational quantities from a perspective that has not been fully studied in the literature. We will conclude by discussing challenges in developing suitable methods for compositional and relational data and review more recent work that addresses several past concerns.
机译:定量研究人员通常将我们的世界视为由社会和技术的结构和功能生成和组织的大量数据集合。通常使用层次结构,组成和关系来显示和访问这些数据。了解此类数据背后的结构和功能需要使用模型和方法来专门分析其关联的数据结构。实现此目标的最大挑战之一是开发一种能够有效利用数据的结构化知识的原则化数据和模型框架。那些数据结构包括组成和关系模式:多个实体必须进行交互和分组才能有意义。尽管传统的基于矢量的数据分析管道已成为科学和技术领域中许多领域的标准定量框架,但它们并不直接适用于组成和关系数据中提取知识,并且存在一些局限性。引入新的数学模型和计算方法来分析大规模组成和关系数据,并验证模型在解决实际问题中的实用性。我们首先介绍几种背景,包括最佳运输,数学中一个古老但令人耳目一新的话题以及概率图形模型(统计建模中的一种流行工具)。特别是,我们解释了最佳运输如何与机器学习中的重要建模概念(又称为匹配)相关。接下来,我们介绍与这些关系和结构模型的计算算法相关的工作,包括使用Wasserstein重心的快速离散分布聚类方法,用于Wasserstein损失最小化的基于模拟退火的不精确预言,用于Wasserstein测地分类的基于Bregman ADMM的oracle,以及用于共识分析的概率多图模型。深入讨论了它们的计算复杂性,数值困难,可伸缩性和准确性问题。通过从文献中尚未充分研究的角度将数据视为关系量,我们将这些计算算法应用于文档分析和众包等多个领域。最后,我们将讨论在开发适用于组成和关系数据的方法时面临的挑战,并回顾解决一些过去关注点的最新工作。

著录项

  • 作者

    Ye, Jianbo.;

  • 作者单位

    The Pennsylvania State University.;

  • 授予单位 The Pennsylvania State University.;
  • 学科 Information science.;Computer science.
  • 学位 Ph.D.
  • 年度 2018
  • 页码 179 p.
  • 总页数 179
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号