首页> 外文会议>International world wide web conference;WWW 09 >Latent Space Domain Transfer between High Dimensional Overlapping Distributions
【24h】

Latent Space Domain Transfer between High Dimensional Overlapping Distributions

机译:高维重叠分布之间的潜在空间域转移

获取原文

摘要

Transferring knowledge from one domain to another is challenging due to a number of reasons. Since both conditional and marginal distribution of the training data and test data are non-identical, model trained in one domain, when directly applied to a different domain, is usually low in accuracy. For many applications with large feature sets, such as text document, sequence data, medical data, image data of different resolutions, etc. two domains usually do not contain exactly the same features, thus introducing large numbers of "missing values" when considered over the union of features from both domains. In other words, its marginal distributions are at most overlapping. In the same time, these problems are usually high dimensional, such as, several thousands of features. Thus, the combination of high dimensionality and missing values make the relationship in conditional probabilities between two domains hard to measure and model. To address these challenges, we propose a framework that first brings the marginal distributions of two domains closer by "filling up" those missing values of disjoint features. Afterwards, it looks for those comparable sub-structures in the "latent-space" as mapped from the expanded feature vector, where both marginal and conditional distribution are similar. With these sub-structures in latent space, the proposed approach then find common concepts that are transferable across domains with high probability. During prediction, unlabeled instances are treated as "queries", the mostly related labeled instances from out-domain are retrieved, and the classification is made by weighted voting using retrieved out-domain examples. We formally show that importing feature values across domains and latent-semantic index can jointly make the distributions of two related domains easier to measure than in original feature space, the nearest neighbor method employed to retrieve related out domain examples is bounded in error when predicting in-domain examples. Software and datasets are available for download.
机译:由于多种原因,将知识从一个领域转移到另一个领域具有挑战性。由于训练数据和测试数据的条件分布和边际分布都不相同,因此在一个域中训练的模型直接应用于其他域时,通常精度较低。对于具有大型功能集的许多应用程序,例如文本文档,序列数据,医学数据,不同分辨率的图像数据等。两个域通常不包含完全相同的功能,因此在考虑过多时会引入大量的“缺失值”来自两个领域的特征的并集。换句话说,其边际分布最多是重叠的。同时,这些问题通常是高维的,例如数千个特征。因此,高维和缺失值的结合使两个域之间的条件概率之间的关系难以度量和建模。为了解决这些挑战,我们提出了一个框架,该框架首先通过“填充”那些缺失的不连续要素的值来缩小两个域的边际分布。然后,它从扩展特征向量映射的“潜空间”中寻找那些可比较的子结构,其中边际分布和条件分布都相似。利用这些在潜在空间中的子结构,所提出的方法然后找到可以在各个域之间以高概率转移的通用概念。在预测期间,将未标记的实例视为“查询”,从域外检索最相关的标记实例,并使用检索到的域外实例通过加权投票进行分类。我们正式表明,跨域导入特征值和潜在语义索引可以共同使两个相关域的分布比原始特征空间更易于度量,用于预测相关域的最近邻域方法在获取相关域外实例时存在误差。域示例。可以下载软件和数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号