【24h】

Asymptotic conditional singular value decomposition for high-dimensional genomic data.

机译:高维基因组数据的渐进条件奇异值分解。

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

High-dimensional data, such as those obtained from a gene expression microarray or second generation sequencing experiment, consist of a large number of dependent features measured on a small number of samples. One of the key problems in genomics is the identification and estimation of factors that associate with many features simultaneously. Identifying the number of factors is also important for unsupervised statistical analyses such as hierarchical clustering. A conditional factor model is the most common model for many types of genomic data, ranging from gene expression, to single nucleotide polymorphisms, to methylation. Here we show that under a conditional factor model for genomic data with a fixed sample size, the right singular vectors are asymptotically consistent for the unobserved latent factors as the number of features diverges. We also propose a consistent estimator of the dimension of the underlying conditional factor model for a finite fixed sample size and an infinite number of features based on a scaled eigen-decomposition. We propose a practical approach for selection of the number of factors in real data sets, and we illustrate the utility of these results for capturing batch and other unmodeled effects in a microarray experiment using the dependence kernel approach of Leek and Storey (2008, Proceedings of the National Academy of Sciences of the United States of America 105, 18718-18723).
机译:高维数据(例如从基因表达微阵列或第二代测序实验中获得的数据)由对少量样品进行测量的大量相关特征组成。基因组学中的关键问题之一是同时识别和估计与许多特征关联的因素。识别因素的数量对于无监督的统计分析(例如层次聚类)也很重要。条件因子模型是从基因表达到单核苷酸多态性再到甲基化等多种类型的基因组数据的最常用模型。在这里,我们显示,在具有固定样本大小的基因组数据的条件因子模型下,随着特征数量的变化,对于未观察到的潜在因子,正确的奇异向量渐近一致。我们还针对有限的固定样本大小和基于缩放特征分解的无限多个特征,提出了基础条件因子模型的维数的一致估计量。我们提出了一种在实际数据集中选择因子数量的实用方法,并使用Leek和Storey(2008,Proceedings of 2009)的依赖核方法,说明了这些结果在微阵列实验中捕获批次和其他未建模效应的效用。美国国家科学院105(18718-18723)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号