首页> 外文期刊>Information and inference >Data-driven regularization of Wasserstein barycenters with an application to multivariate density registration

Data-driven regularization of Wasserstein barycenters with an application to multivariate density registration

机译:数据驱动的Wasserstein Barycenters的正则化,并应用于多元密度登记

获取原文并翻译 | 示例


We present a framework to simultaneously align and smoothen data in the form of multiple point clouds sampled from unknown densities with support in a d-dimensional Euclidean space. This work is motivated by applications in bioinformatics where researchers aim to automatically homogenize large datasets to compare and analyze characteristics within a same cell population. Inconveniently, the information acquired is most certainly noisy due to misalignment caused by technical variations of the environment. To overcome this problem, we propose to register multiple point clouds by using the notion of regularized barycenters (or Fréchet mean) of a set of probability measures with respect to the Wasserstein metric. The first approach consists in penalizing a Wasserstein barycenter with a convex functional as recently proposed in [5]. The second strategy is to transform the Wasserstein metric itself into an entropy regularized transportation cost between probability measures as introduced in [12]. The main contribution of this work is to propose data-driven choices for the regularization parameters involved in each approach using the Goldenshluger–Lepski’s principle. Simulated data sampled from Gaussian mixtures are used to illustrate each method, and an application to the analysis of flow cytometry data is finally proposed. This way of choosing of the regularization parameter for the Sinkhorn barycenter is also analyzed through the prism of an oracle inequality that relates the error made by such data-driven estimators to the one of an ideal estimator.
机译:我们提出了一个框架,以从未知密度的多点云的形式同时对齐和平滑数据,并在D维欧几里得空间中支持并支持。这项工作是由生物信息学中的应用激励的,研究人员旨在自动匀浆大型数据集,以比较和分析同一细胞种群中的特征。不方便的是,由于环境的技术变化引起的未对准,获取的信息肯定是嘈杂的。为了克服这个问题,我们建议通过使用有关Wasserstein Metric的一组概率度量的正则Barycenters(或Fréchet平均值)的概念来注册多点云。第一种方法包括对[5]中最近提出的凸功能的Wasserstein Barycenter处罚。第二种策略是将沃斯坦大学度量本身转变为[12]中介绍的概率指标之间的熵正规运输成本。这项工作的主要贡献是使用Goldenshluger – Lepski的原理提出针对每种方法中涉及的正则参数的数据驱动选择。从高斯混合物中采样的模拟数据用于说明每种方法,并最终提出了流式细胞仪数据分析的应用。还通过甲骨文不平等的棱镜来分析了这种选择sindhorn barycenter的正则化参数的方式,该棱镜将这种数据驱动的估计器与理想估计器之一相关联。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号