首页> 外文学位 >Disparate information fusion in the dissimilarity framework.
【24h】

Disparate information fusion in the dissimilarity framework.

机译:差异框架中的异构信息融合。

获取原文
获取原文并翻译 | 示例

摘要

We study the problem of combining multiple disparate types of data to improve the performances in various inferential tasks, and we propose the dissimilarity framework, which contains two steps: (1) calculate one or more dissimilarity matrices for each data source; and (2) combine all the dissimilarity matrices for the inferential purpose. In the first step, we take advantage of the knowledge of experts in each area, and unify disparate types of data into the dissimilarity space. In this dissertation, we focus on developing methods for combining multiple dissimilarity matrices.;One of the most widely used approach for using dissimilarity data involves converting the dissimilarity matrix into a configuration of points (called the embedding) through multidimensional scaling, and then building statistical models based on the embedding. To use later collected observations, called the out-of-sample data, one could re-do the embedding and modeling process, but it is not efficient. We study the alternative of out-of-sample embedding, and develop the out-of-sample embedding approach, OOSIM, to insert the out-of-sample objects into the existing embedding by minimizing sum of squared differences between dissimilarities and the corresponding Euclidean distances. Iterative majorization is used to minimize the criterion function. The simulation experiment suggests that OOSIM is a natural extension to de Leeuw's multidimensional scaling procedure, SMACOF, which minimizes the raw stress.;We develop the J-function approach to combine multiple dissimilarity matrices in the space of the Cartesian product of the embeddings. Due to the high dimensionality of this space, we introduce a novel supervised dimensionality reduction method. The simulation and real data results show that our approach can improve classification accuracy compared to the alternatives of principal components analysis and no dimensionality reduction at all.;We also consider information fusion from a different perspective. Suppose that objects are measured under multiple conditions---e.g., indoor lighting versus outdoor lighting for face recognition, multiple language translation for document matching, etc.---the challenging task is to perform data fusion and utilize all the available information for inferential purposes. We consider two exploitation tasks: (1) how to determine whether a set of feature vectors represent a single object measured under different conditions; and (2) how to create a classifier based on training data collected under one condition in order to classify objects measured in other conditions. The key to both problems is to transform all sets of feature vectors into one commensurate space, where the (transformed) feature vectors are comparable and would be treated as if they were collected under the same condition. Toward this end, we study Procrustes analysis and develop a new approach. We illustrate our methodology on English and French documents collected from Wikipedia, demonstrating superior performance compared to that obtained via standard Procrustes transformation.;We introduce a way to generate a collection of 3D shapes of different groups, and study the problem of combining multiple dissimilarity matrices derived from the same set of shapes for classification purpose. Experiment results show that different dissimilarity measures may capture different aspects of information and consequently combining all the dissimilarity matrices in an optimal way results in a higher classification accuracy than using each single dissimilarity matrix alone.
机译:我们研究了组合多种不同类型的数据以提高各种推理任务的性能的问题,并提出了一种相异性框架,该框架包括两个步骤:(1)为每个数据源计算一个或多个相异性矩阵; (2)结合所有相异矩阵以进行推论。第一步,我们利用各个领域专家的知识,并将完全不同的数据类型统一到相异空间中。在本文中,我们着重于开发用于组合多个相异矩阵的方法。一种使用相异数据的最广泛使用的方法之一是通过多维缩放将相异矩阵转换为点的配置(称为嵌入),然后建立统计量。基于嵌入的模型。要使用后来收集到的观测值(称为样本外数据),可以重新进行嵌入和建模过程,但是效率不高。我们研究了样本外嵌入的替代方法,并开发了样本外嵌入方法OOSIM,通过最小化相异度和相应的欧几里得之间的平方差之和,将样本外对象插入到现有的嵌入中距离。迭代主化用于最小化准则函数。仿真实验表明,OOSIM是de Leeuw的多维缩放过程SMACOF的自然扩展,它最小化了原始应力。我们开发了J函数方法,在嵌入的笛卡尔积的空间中组合了多个相异矩阵。由于该空间的维数较高,因此我们引入了一种新颖的监督维数减少方法。仿真和实际数据结果表明,与主成分分析的替代方法相比,我们的方法可以提高分类准确性,并且完全不降低维数。我们还从不同的角度考虑了信息融合。假设在多个条件下测量对象-例如,室内照明与室外照明进行面部识别,多语言翻译进行文档匹配等-一项具有挑战性的任务是执行数据融合并利用所有可用信息进行推论目的。我们考虑两个开发任务:(1)如何确定一组特征向量是否表示在不同条件下测量的单个对象; (2)如何基于在一个条件下收集的训练数据创建分类器,以对在其他条件下测量的对象进行分类。这两个问题的关键是将所有特征向量集转换到一个相对应的空间,在该空间中(转换后的)特征向量是可比较的,并且将其视为在相同条件下收集的。为此,我们研究Procrustes分析并开发一种新方法。我们举例说明了从Wikipedia收集的英语和法语文档的方法,与通过标准Procrustes转换所获得的方法相比,它表现出了卓越的性能。从同一组形状导出以进行分类。实验结果表明,不同的相似度度量可以捕获信息的不同方面,因此,与单独使用每个单个相似度矩阵相比,以最佳方式组合所有相似度矩阵可以提高分类精度。

著录项

  • 作者

    Ma, Zhiliang.;

  • 作者单位

    The Johns Hopkins University.;

  • 授予单位 The Johns Hopkins University.;
  • 学科 Applied Mathematics.;Statistics.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 109 p.
  • 总页数 109
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号