首页> 美国卫生研究院文献>International Journal of Epidemiology >Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies
【2h】

Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies

机译:在表观遗传学流行病学研究中寻找特征性甲基化区域

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Background During the past 5 years, high-throughput technologies have been successfully used by epidemiology studies, but almost all have focused on sequence variation through genome-wide association studies (GWAS). Today, the study of other genomic events is becoming more common in large-scale epidemiological studies. Many of these, unlike the single-nucleotide polymorphism studied in GWAS, are continuous measures. In this context, the exercise of searching for regions of interest for disease is akin to the problems described in the statistical ‘bump hunting’ literature.>Methods New statistical challenges arise when the measurements are continuous rather than categorical, when they are measured with uncertainty, and when both biological signal, and measurement errors are characterized by spatial correlation along the genome. Perhaps the most challenging complication is that continuous genomic data from large studies are measured throughout long periods, making them susceptible to ‘batch effects’. An example that combines all three characteristics is genome-wide DNA methylation measurements. Here, we present a data analysis pipeline that effectively models measurement error, removes batch effects, detects regions of interest and attaches statistical uncertainty to identified regions.>Results We illustrate the usefulness of our approach by detecting genomic regions of DNA methylation associated with a continuous trait in a well-characterized population of newborns. Additionally, we show that addressing unexplained heterogeneity like batch effects reduces the number of false-positive regions.>Conclusions Our framework offers a comprehensive yet flexible approach for identifying genomic regions of biological interest in large epidemiological studies using quantitative high-throughput methods.
机译:>背景在过去的5年中,高通量技术已被流行病学研究成功使用,但几乎所有技术都通过全基因组关联研究(GWAS)集中在序列变异上。如今,其他基因组事件的研究在大规模流行病学研究中变得越来越普遍。与在GWAS中研究的单核苷酸多态性不同,其中许多是连续测量。在这种情况下,寻找疾病感兴趣区域的工作类似于统计“凹凸不清”文献中描述的问题。>方法当测量是连续的而不是分类的时,就会出现新的统计挑战,当不确定性地测量它们时,以及生物学信号和测量误差都通过沿着基因组的空间相关性来表征时。也许最具挑战性的并发症是,长期对大型研究的连续基因组数据进行测量,使其容易受到“批次效应”的影响。结合所有这三个特征的一个例子是全基因组DNA甲基化测量。在这里,我们提供了一个数据分析管道,该管道可以有效地建模测量误差,消除批次效应,检测感兴趣的区域并将统计不确定性附加到已识别的区域。>结果我们通过检测基因组区域来说明我们的方法的有用性在特征明确的新生儿群体中,DNA甲基化与连续性状相关。此外,我们证明解决批次效应之类的无法解释的异质性可减少假阳性区域的数量。吞吐量方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号