首页> 外文OA文献 >A MULTILEVEL MODEL TO ADDRESS BATCH EFFECTS IN COPY NUMBER ESTIMATION USING SNP ARRAYS
【2h】

A MULTILEVEL MODEL TO ADDRESS BATCH EFFECTS IN COPY NUMBER ESTIMATION USING SNP ARRAYS

机译:使用SNP数组解决批号估计的批处理效果的多层次模型

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Submicroscopic changes in chromosomal DNA copy number dosage are common and have been implicated in many heritable diseases and cancers. Recent high-throughput technologies have a resolution that permits the detection of segmental changes in DNA copy number that span thousands of basepairs across the genome. Genome-wide association studies (GWAS) may simultaneously screen for copy number-phenotype and SNP-phenotype associations as part of the analytic strategy. However, genome-wide array analyses are particularly susceptible to batch effects as the logistics of preparing DNA and processing thousands of arrays often involves multiple laboratories and technicians, or changes over calendar time to the reagents and laboratory equipment. Failure to adjust for batch effects can lead to incorrect inference and requires inefficient post-hoc quality control procedures that exclude regions that are associated with batch. Our work extends previous model-based approaches for copy number estimation by explicitly modeling batch effects and using shrinkage to improve locus-specific estimates of copy number uncertainty. Key features of this approach include the use of diallelic genotype calls from experimental data to estimate batch- and locus-specific parameters of background and signal without the requirement of training data. We illustrate these ideas using a study of bipolar disease and a study of chromosome 21 trisomy. The former has batch effects that dominate much of the observed variation in quantile-normalized intensities, while the latter illustrates the robustness of our approach to datasets where as many as 25% of the samples have altered copy number. Locus-specific estimates of copy number can be plotted on the copy-number scale to investigate mosaicism and guide the choice of appropriate downstream approaches for smoothing the copy number as a function of physical position. The software is open source and implemented in the R package CRLMM available at Bioconductor (http:www.bioconductor.org).
机译:染色体DNA拷贝数剂量的亚显微变化很常见,并已涉及许多遗传性疾病和癌症。最近的高通量技术具有一种分辨率,可以检测跨基因组数千个碱基对的DNA拷贝数的片段变化。作为分析策略的一部分,全基因组关联研究(GWAS)可以同时筛选拷贝数-表型和SNP-表型的关联。但是,全基因组阵列分析尤其容易受到批处理的影响,因为制备DNA和处理数千个阵列的物流通常涉及多个实验室和技术人员,或者随着试剂和实验室设备日历时间的变化。无法针对批处理效果进行调整可能导致错误的推断,并且需要无效的事后质量控制程序,该程序要排除与批处理相关的区域。我们的工作扩展了以前基于模型的拷贝数估计方法,方法是对批处理效果进行显式建模,并使用收缩来改善拷贝数不确定性的特定于基因座的估计。该方法的主要特征包括使用来自实验数据的对立基因型进行调用,以估计背景和信号的批次和基因座特定参数,而无需训练数据。我们通过研究躁郁症和研究21号染色体三体性来阐明这些想法。前者具有批量效应,在分位数归一化强度中观察到的大部分变化都占主导地位,而后者则说明了我们的数据集方法的鲁棒性,其中多达25%的样本改变了拷贝数。可以在拷贝数标度上绘制特定于拷贝数的位点估计值,以研究镶嵌性,并指导选择合适的下游方法来平滑拷贝数作为物理位置的函数。该软件是开源软件,并在Bioconductor(http://www.bioconductor.org)的R包CRLMM中实现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号