首页> 外文会议>International symposium on medical information processing and analysis >Secure multivariate large-scale multi-centric analysis through on-line learning: an imaging genetics case study
【24h】

Secure multivariate large-scale multi-centric analysis through on-line learning: an imaging genetics case study

机译:通过在线学习进行安全的多变量大规模多中心分析:成像遗传学案例研究

获取原文

摘要

State-of-the-art data analysis methods in genetics and related fields have advanced beyond massively univariate analyses. However, these methods suffer from the limited amount of data available at a single research site. Recent large-scale multi-centric imaging-genetic studies, such as ENIGMA, have to rely on meta-analysis of mass univariate models to achieve critical sample sizes for uncovering statistically significant associations. Indeed, model parameters, but not data, can be securely and anonymously shared between partners. We propose here partial least squares (PLS) as a multivariate imaging-genetics model in meta-studies. In particular, we propose an online estimation approach to partial least squares for the sequential estimation of the model parameters in data batches, based on an approximation of the singular value decomposition (SVD) of partitioned covariance matri-ces.We applied the proposed approach to the challenging problem of modeling the association between 1,167,117 genetic markers (SNPs, single nucleotide polymorphisms) and the brain cortical and sub-cortical atrophy (354,804 anatomical surface features) in a cohort of 639 individuals from the Alzheimer's Disease Neuroimaging Initiative. We compared two different modeling strategies (sequential- and meta-PLS) to the classic non-distributed PLS. Both strategies exhibited only minimal approximation errors of model parameters. The proposed approaches pave the way to the application of multivariate models in large scale imaging-genetics meta-studies, and may lead to novel understandings of the complex brain phenotype-genotype interactions.
机译:遗传学和相关领域中最先进的数据分析方法已经超出了大规模单变量分析的范围。但是,这些方法受单个研究站点可用数据量有限的困扰。最近的大规模多中心成像遗传研究(例如ENIGMA)必须依靠质量单变量模型的荟萃分析来获得关键的样本量,以发现具有统计学意义的关联。实际上,可以在合作伙伴之间安全且匿名地共享模型参数,而不是数据。我们在这里提出偏最小二乘(PLS)作为多元研究中的多元成像遗传学模型。尤其是,我们基于分区协方差矩阵的奇异值分解(SVD)的近似值,提出了一种针对偏最小二乘的在线估计方法,以对数据批次中的模型参数进行顺序估计。在阿尔茨海默氏病神经影像学倡议组织的639名患者中,建模1,167,117个遗传标记(SNP,单核苷酸多态性)与大脑皮层和皮层下萎缩(354,804解剖表面特征)之间的关联这一具有挑战性的问题。我们将两种不同的建模策略(顺序PLS和元PLS)与经典的非分布式PLS进行了比较。两种策略均仅表现出模型参数的最小近似误差。所提出的方法为多元模型在大规模成像遗传学元研究中的应用铺平了道路,并且可能导致对复杂的脑表型-基因型相互作用的新颖理解。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号