首页> 外文会议>IEEE Conference on Decision and Control >Integrated analysis of multiple high-dimensional data sets by joint rank-1 matrix approximations
【24h】

Integrated analysis of multiple high-dimensional data sets by joint rank-1 matrix approximations

机译:通过联合Rank-1矩阵近似对多个高维数据集进行综合分析

获取原文

摘要

In this work, we developed an algorithm for the integrated analysis of multiple high-dimensional data matrices based on sparse rank-one matrix approximations. The algorithm approximates multiple data matrices with rank one outer products composed of sparse left singular-vectors that are unique to each matrix and a right singular-vector that is shared by all of the data matrices. The right-singular vector represents a signal we wish to detect in the row-space of each matrix. The non-zero components of the resulting left-singular vectors identify rows of each matrix that in aggregate provide a sparse linear representation of the shared right-singular vector. This sparse representation facilitates downstream interpretation and validation of the resulting model based on the rows selected from each matrix. False discovery rate is used to select an appropriate ???1 penalty parameter that imposes sparsity on the left singular-vector but not the common right singular-vector of the joint approximation. Since a given multi-modal data set (MMDS) may contain multiple signals of interest the algorithm is iteratively applied to the residualized version of original data to sequentially capture and model each distinct signal in terms of rows from the different matrices. We show that the algorithm outperforms standard singular value decomposition over a wide range of simulation scenarios in terms of detection accuracy. Analysis of real data for ovarian and liver cancer resulted in compact gene expression signatures that were predictive of clinical outcomes and highly enriched for cancer related biology.
机译:在这项工作中,我们开发了一种基于稀疏秩一矩阵近似对多个高维数据矩阵进行综合分析的算法。该算法可对多个数据矩阵进行近似处理,这些数据矩阵的第一个外部乘积由每个矩阵唯一的稀疏左奇异矢量和所有数据矩阵共享的右奇异矢量组成。右奇异矢量代表我们希望在每个矩阵的行空间中检测到的信号。所得左奇异矢量的非零分量标识每个矩阵的行,这些行合计可提供共享的右奇异矢量的稀疏线性表示。这种稀疏表示有助于从每个矩阵中选择的行进行下游解释和对所得模型的验证。错误发现率用于选择适当的1罚分参数,该参数将稀疏性施加在联合逼近的左奇异矢量而不是公共右奇异矢量上。由于给定的多模态数据集(MMDS)可能包含多个感兴趣的信号,因此将该算法迭代地应用于原始数据的残差化版本,以根据来自不同矩阵的行依次捕获和建模每个不同的信号。我们表明,在检测精度方面,该算法在广泛的模拟场景中均优于标准奇异值分解。卵巢癌和肝癌真实数据的分析产生了紧凑的基因表达特征,可预测临床结果并高度丰富与癌症相关的生物学知识。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号