【24h】

Contrastive Learning Using Spectral Methods

机译:使用光谱方法对比学习

获取原文

摘要

In many natural settings, the analysis goal is not to characterize a single data set in isolation, but rather to understand the difference between one set of observations and another. For example, given a background corpus of news articles together with writings of a particular author, one may want a topic model that explains word patterns and themes specific to the author. Another example comes from genomics, in which biological signals may be collected from different regions of a genome, and one wants a model that captures the differential statistics observed in these regions. This paper formalizes this notion of contrastive learning for mixture models, and develops spectral algorithms for inferring mixture components specific to a foreground data set when contrasted with a background data set. The method builds on recent moment-based estimators and tensor decompositions for latent variable models, and has the intuitive feature of using background data statistics to appropriately modify moments estimated from foreground data. A key advantage of the method is that the background data need only be coarsely modeled, which is important when the background is too complex, noisy, or not of interest. The method is demonstrated on applications in contrastive topic modeling and genomic sequence analysis.
机译:在许多自然的环境,分析目标不是孤立地描述单个数据集,而是要明白一组观察和另一个的区别。例如,给定的新闻文章的背景语料库与特定作者的著作一起,一个可能需要一个主题模型,解释字图案和主题特定的作者。另一个例子来自基因组学,其中生物信号可以从基因组的不同区域收集,并且一个希望的是在这些区域中观察到捕获的差分统计模型。本文形式化为混合模型这个概念对比学习的,并开发光谱算法,用于当与背景数据组对比推断混合物组分特定于前景数据集。该方法建立在最近时刻为基础的估计和张量分解为潜变量模型,并具有使用后台数据统计,以适当地修改从前台数据估计瞬间的直观特征。该方法的主要优点是后台数据只需要粗略建模,这是重要的,当背景太复杂,嘈杂的,或者不感兴趣。该方法证明在对比主题建模和基因组序列分析的应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号