首页> 外文期刊>Nature Communications >Exploring patterns enriched in a dataset with contrastive principal component analysis
【24h】

Exploring patterns enriched in a dataset with contrastive principal component analysis

机译:通过对比主成分分析探索数据集中的模式

获取原文
           

摘要

Visualization and exploration of high-dimensional data is a ubiquitous challenge across disciplines. Widely used techniques such as principal component analysis (PCA) aim to identify dominant trends in one dataset. However, in many settings we have datasets collected under different conditions, e.g., a treatment and a control experiment, and we are interested in visualizing and exploring patterns that are specific to one dataset. This paper proposes a method, contrastive principal component analysis (cPCA), which identifies low-dimensional structures that are enriched in a dataset relative to comparison data. In a wide variety of experiments, we demonstrate that cPCA with a background dataset enables us to visualize dataset-specific patterns missed by PCA and other standard methods. We further provide a geometric interpretation of cPCA and strong mathematical guarantees. An implementation of cPCA is publicly available, and can be used for exploratory data analysis in many applications where PCA is currently used.
机译:高维数据的可视化和探索是跨学科普遍存在的挑战。广泛使用的技术(例如主成分分析(PCA))旨在识别一个数据集中的主要趋势。但是,在许多情况下,我们有在不同条件下(例如治疗和对照实验)收集的数据集,并且我们对可视化和探索特定于一个数据集的模式感兴趣。本文提出了一种方法,对比主成分分析(cPCA),它可以识别相对于比较数据而言,数据集中丰富的低维结构。在各种各样的实验中,我们证明了具有背景数据集的cPCA使我们能够可视化PCA和其他标准方法遗漏的数据集特定模式。我们进一步提供了cPCA的几何解释和强大的数学保证。 cPCA的实现是公开可用的,并且可以在当前使用PCA的许多应用中用于探索性数据分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号