首页> 美国卫生研究院文献>Frontiers in Cell and Developmental Biology >Integrative analysis of multiple diverse omics datasets by sparse group multitask regression
【2h】

Integrative analysis of multiple diverse omics datasets by sparse group multitask regression

机译:通过稀疏组多任务回归对多种多样的组学数据集进行综合分析

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

A variety of high throughput genome-wide assays enable the exploration of genetic risk factors underlying complex traits. Although these studies have remarkable impact on identifying susceptible biomarkers, they suffer from issues such as limited sample size and low reproducibility. Combining individual studies of different genetic levels/platforms has the promise to improve the power and consistency of biomarker identification. In this paper, we propose a novel integrative method, namely sparse group multitask regression, for integrating diverse omics datasets, platforms, and populations to identify risk genes/factors of complex diseases. This method combines multitask learning with sparse group regularization, which will: (1) treat the biomarker identification in each single study as a task and then combine them by multitask learning; (2) group variables from all studies for identifying significant genes; (3) enforce sparse constraint on groups of variables to overcome the “small sample, but large variables” problem. We introduce two sparse group penalties: sparse group lasso and sparse group ridge in our multitask model, and provide an effective algorithm for each model. In addition, we propose a significance test for the identification of potential risk genes. Two simulation studies are performed to evaluate the performance of our integrative method by comparing it with conventional meta-analysis method. The results show that our sparse group multitask method outperforms meta-analysis method significantly. In an application to our osteoporosis studies, 7 genes are identified as significant genes by our method and are found to have significant effects in other three independent studies for validation. The most significant gene SOD2 has been identified in our previous osteoporosis study involving the same expression dataset. Several other genes such as TREML2, HTR1E, and GLO1 are shown to be novel susceptible genes for osteoporosis, as confirmed from other studies.
机译:各种高通量全基因组测定使探索复杂性状基础的遗传危险因素成为可能。尽管这些研究对识别易感生物标志物具有显着影响,但它们仍存在样本量有限和可重复性低等问题。结合不同遗传水平/平台的个体研究有望改善生物标志物鉴定的功能和一致性。在本文中,我们提出了一种新颖的集成方法,即稀疏组多任务回归,用于集成各种组学数据集,平台和人群,以识别复杂疾病的风险基因/因素。该方法将多任务学习与稀疏组正则化相结合,将:(1)将每个单个研究中的生物标记识别视为一项任务,然后通过多任务学习将它们组合在一起; (2)将所有研究的分组变量用于识别重要基因; (3)对变量组实施稀疏约束,以克服“小样本但大变量”的问题。我们在多任务模型中引入了两个稀疏组惩罚:稀疏组套索和稀疏组脊,并为每个模型提供了有效的算法。另外,我们提出了用于鉴定潜在风险基因的显着性检验。通过将其与常规荟萃分析方法进行比较,进行了两项模拟研究以评估我们的集成方法的性能。结果表明,我们的稀疏组多任务方法明显优于荟萃分析方法。在我们的骨质疏松症研究中,有7种基因被我们的方法鉴定为重要基因,并在其他三项独立研究中被发现具有显著作用。在我们以前的骨质疏松症研究中已鉴定出最重要的基因SOD2,涉及相同的表达数据集。其他研究证实,其他几种基因,例如TREML2,HTR1E和GLO1,是骨质疏松症的新型易感基因。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号