首页> 美国卫生研究院文献>PLoS Genetics >Integration Analysis of Three Omics Data Using Penalized Regression Methods: An Application to Bladder Cancer
【2h】

Integration Analysis of Three Omics Data Using Penalized Regression Methods: An Application to Bladder Cancer

机译:用罚回归法对三种组学数据进行积分分析:在膀胱癌中的应用

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Omics data integration is becoming necessary to investigate the genomic mechanisms involved in complex diseases. During the integration process, many challenges arise such as data heterogeneity, the smaller number of individuals in comparison to the number of parameters, multicollinearity, and interpretation and validation of results due to their complexity and lack of knowledge about biological processes. To overcome some of these issues, innovative statistical approaches are being developed. In this work, we propose a permutation-based method to concomitantly assess significance and correct by multiple testing with the MaxT algorithm. This was applied with penalized regression methods (LASSO and ENET) when exploring relationships between common genetic variants, DNA methylation and gene expression measured in bladder tumor samples. The overall analysis flow consisted of three steps: (1) SNPs/CpGs were selected per each gene probe within 1Mb window upstream and downstream the gene; (2) LASSO and ENET were applied to assess the association between each expression probe and the selected SNPs/CpGs in three multivariable models (SNP, CPG, and Global models, the latter integrating SNPs and CPGs); and (3) the significance of each model was assessed using the permutation-based MaxT method. We identified 48 genes whose expression levels were significantly associated with both SNPs and CPGs. Importantly, 36 (75%) of them were replicated in an independent data set (TCGA) and the performance of the proposed method was checked with a simulation study. We further support our results with a biological interpretation based on an enrichment analysis. The approach we propose allows reducing computational time and is flexible and easy to implement when analyzing several types of omics data. Our results highlight the importance of integrating omics data by applying appropriate statistical strategies to discover new insights into the complex genetic mechanisms involved in disease conditions.
机译:Omics数据集成已成为研究复杂疾病涉及的基因组机制的必要条件。在整合过程中,出现了许多挑战,例如数据异质性,与参数数量相比较少的个体数量,多重共线性以及由于其复杂性和对生物过程的了解而对结果进行解释和验证。为了克服其中一些问题,正在开发创新的统计方法。在这项工作中,我们提出了一种基于置换的方法,以同时评估重要性并通过使用MaxT算法进行多次测试进行校正。当探索常见的遗传变异,DNA甲基化和膀胱肿瘤样品中测得的基因表达之间的关系时,将其与惩罚回归方法(LASSO和ENET)一起应用。整个分析流程包括三个步骤:(1)在基因上游和下游的1Mb窗口内,每个基因探针均选择SNP / CpG; (2)在三个多变量模型(SNP,CPG和Global模型,后者整合了SNP和CPG)中,使用LASSO和ENET评估每种表达探针与所选SNP / CpG之间的关联; (3)使用基于置换的MaxT方法评估每个模型的重要性。我们鉴定了48个基因的表达水平与SNPs和CPGs显着相关。重要的是,其中36个(75%)被复制到一个独立的数据集(TCGA)中,并通过仿真研究检查了所提出方法的性能。我们通过基于富集分析的生物学解释进一步支持我们的结果。我们提出的方法可以减少计算时间,并且在分析几种类型的组学数据时非常灵活且易于实现。我们的结果强调了通过应用适当的统计策略来发现有关疾病状况所涉及的复杂遗传机制的新见解,整合组学数据的重要性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号