首页> 外文会议>Annual Allerton Conference on Communication, Control, and Computing >Unsupervised Metric Learning in Presence of Missing Data
【24h】

Unsupervised Metric Learning in Presence of Missing Data

机译:存在数据缺失的无监督度量学习

获取原文

摘要

For many machine learning tasks, the input data lie on a low-dimensional manifold embedded in a high-dimensional space and, because of this high-dimensional structure, most algorithms inefficient. The typical solution is to reduce the dimension of the input data using a standard dimension reduction algorithms such as ISOMAP, LAPLACIAN EIGENMAPS or LLEs. This approach, however, does not always work in practice as these algorithms require that we have somewhat ideal data. Unfortunately, most data sets either have missing entries or unacceptably noisy values. That is, real data are far from ideal and we cannot use these algorithms directly. In this paper, we focus on the case when we have missing data. Some techniques, such as matrix completion, can be used to fill in missing data but these methods do not capture the non-linear structure of the manifold. Here, we present a new algorithm MR-MISSING that extends these previous algorithms and can be used to compute low dimensional representation on data sets with missing entries. We demonstrate the effectiveness of our algorithm by running three different experiments. We visually verify the effectiveness of our algorithm on synthetic manifolds, we numerically compare our projections against those computed by first filling in data using nlPCA and mDRUR on the MNIST data set, and we also show that we can do classification on MNIST with missing data. We also provide a theoretical guarantee for MR-MISSING under some simplifying assumptions.
机译:对于许多机器学习任务,输入数据位于嵌入高维空间的低维流形上,由于这种高维结构,大多数算法效率不高。典型的解决方案是使用标准的降维算法(例如ISOMAP,LAPLACIAN EIGENMAPS或LLE)来缩小输入数据的维数。但是,这种方法在实践中并不总是可行,因为这些算法要求我们拥有一些理想的数据。不幸的是,大多数数据集要么缺少条目,要么噪声值不可接受。也就是说,真实数据远非理想,我们不能直接使用这些算法。在本文中,我们着重于缺少数据的情况。某些技术(例如矩阵完成)可用于填充丢失的数据,但是这些方法无法捕获流形的非线性结构。在这里,我们提出了一种新的算法MR-MISSING,该算法扩展了这些先前的算法,可用于计算缺少条目的数据集的低维表示。我们通过运行三个不同的实验来证明我们算法的有效性。我们在视觉上验证了我们算法在合成流形上的有效性,我们将预测与通过使用MNIST数据集上的nPCA和mDRUR首次填充数据而计算出的预测值进行了数值比较,并且还显示了我们可以对缺失数据的MNIST进行分类。在某些简化的假设下,我们还为MR-MISSING提供了理论上的保证。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号