首页> 美国卫生研究院文献>Bioinformatics >Penalized logistic regression for high-dimensional DNA methylation data with case-control studies
【2h】

Penalized logistic regression for high-dimensional DNA methylation data with case-control studies

机译:基于案例对照研究的高维DNA甲基化数据的惩罚逻辑回归

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Motivation: DNA methylation is a molecular modification of DNA that plays crucial roles in regulation of gene expression. Particularly, CpG rich regions are frequently hypermethylated in cancer tissues, but not methylated in normal tissues. However, there are not many methodological literatures of case-control association studies for high-dimensional DNA methylation data, compared with those of microarray gene expression. One key feature of DNA methylation data is a grouped structure among CpG sites from a gene that are possibly highly correlated. In this article, we proposed a penalized logistic regression model for correlated DNA methylation CpG sites within genes from high-dimensional array data. Our regularization procedure is based on a combination of the l1 penalty and squared l2 penalty on degree-scaled differences of coefficients of CpG sites within one gene, so it induces both sparsity and smoothness with respect to the correlated regression coefficients. We combined the penalized procedure with a stability selection procedure such that a selection probability of each regression coefficient was provided which helps us make a stable and confident selection of methylation CpG sites that are possibly truly associated with the outcome.>Results: Using simulation studies we demonstrated that the proposed procedure outperforms existing main-stream regularization methods such as lasso and elastic-net when data is correlated within a group. We also applied our method to identify important CpG sites and corresponding genes for ovarian cancer from over 20 000 CpGs generated from Illumina Infinium HumanMethylation27K Beadchip. Some genes identified are potentially associated with cancers.>Contact: >Supplementary information: are available at Bioinformatics online.
机译:>动机: DNA甲基化是DNA的分子修饰,在基因表达的调控中起着至关重要的作用。特别地,富含CpG的区域在癌组织中经常被甲基化,而在正常组织中未被甲基化。但是,与微阵列基因表达相比,针对高维DNA甲基化数据的病例-对照关联研究的方法学文献并不多。 DNA甲基化数据的一个关键特征是来自基因的CpG位点之间的分组结构,这些结构可能高度相关。在本文中,我们针对来自高维数组数据的基因中的相关DNA甲基化CpG位点提出了一种惩罚逻辑回归模型。我们的正则化程序是基于对一个基因内CpG位点系数的度数差异的l1罚分和平方的l2罚分的组合,因此就相关的回归系数而言,它既引起稀疏性,又引起了平滑度。我们将惩罚程序与稳定性选择程序结合在一起,从而提供了每个回归系数的选择概率,这有助于我们对可能与结果真正相关的甲基化CpG位点进行稳定而自信的选择。>结果:通过仿真研究,我们证明了当组内的数据相关时,所提出的过程优于现有的主流正则化方法,例如套索和弹性网。我们还应用了我们的方法,从Illumina Infinium HumanMethylation27K Beadchip生成的2万多个CpG中鉴定出重要的CpG位点和卵巢癌的相应基因。某些已鉴定的基因可能与癌症有关。>联系方式: >补充信息可从在线生物信息学获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号