首页> 外文会议>Annual International Conference on Research in Computational Molecular Biology >Estimating Genome-Wide Copy Number Using Allele Specific Mixture Models
【24h】

Estimating Genome-Wide Copy Number Using Allele Specific Mixture Models

机译:使用等位基因特异性混合模型估算基因组拷贝数

获取原文

摘要

Genomic changes such as copy number alterations are thought to be one of the major underlying causes of human phenotypic variation among normal arid disease subjects [23,11,25,26,5,4,7,18]. These include chromosomal regions with so-called copy numberalterations: instead of the expected two copies, a section of the chromosome for a particular individual may have zero copies (homozygous deletion), one copy (hemizygous deletions), or more than two copies (amplifications). The canonical example is Downsyndrome which is caused by an extra copy of chromosome 21. Identification of such abnormalities in smaller regions has been of great interest, because it is believed to be an underlying cause of cancer. More than one decade ago comparative genomic hybridization (CGH) technology was developed to detect copy number changes in a high-throughput fashion. However, this technology only provides a 10 MB resolution which limits the ability to detect copy number alterations spanning small regions. It is widelybelieved that a copy number alteration as small as one base can have significant downstream effects, thus microarray manufacturers have developed technologies that provide much higher resolution. Unfortunately, strong probe effects and variation introduced by sample preparation procedures have made single-point copy number estimates too imprecise to be useful. CGH arrays use a two-color hybridization, usually comparing a sample of interest to a reference sample, which to some degree removes the probe effect. However, the resolution is not nearly high enough to provide single-point copy number estimates. Various groups have proposed statistical procedures that pool data from neighboring locations to successfully improve precision. However, these procedure need to average across relatively large regions to work effectively thus greatly reducing the resolution. Recently, regression-type models that account for probe-effect have been proposed and appear to improve accuracy as well as precision. In this paper, we propose a mixture model solution specifically designed for single-point estimation, that provides various advantages over the existing methodology. We use a 314 sample database, constructed with public datasets, to motivate and fit models for theconditional distribution of the observed intensities given allele specific copy numbers. With the estimated models in place we can compute posterior probabilities that provide a useful prediction rule as well as a confidence measure for each call. Software to implement this procedure will be available in the Bioconductor oligo package (http: //www. bioconductor.org).
机译:拷贝数改变的基因组变化被认为是正常干旱疾病受试者人类表型变异的主要原因之一[23,11,25,26,5,4,7,18]。这些包括具有所谓的拷贝数的染色区域:代替预期的两份拷贝,特定个体的染色体的一部分可能具有零拷贝(纯合缺失),一种拷贝(血液缺失)或超过两份(扩增) )。规范实例是较低的染色体额外拷贝的副血症。鉴定较小地区的这种异常具有很大的兴趣,因为它被认为是癌症的根本原因。超过一个十年前,开发了比较基因组杂交(CGH)技术以检测高吞吐量的拷贝数变化。然而,该技术仅提供10 MB的分辨率,这限制了检测跨越小区域的拷贝数改变的能力。它广泛地展现了一个小作为一个基地的拷贝数改变可以具有显着的下游效应,因此微阵列制造商已经开发出提供更高的分辨率的技术。遗憾的是,采样制备程序引入的强大探针效果和变化使单点拷贝数估计太不起图是有用的。 CGH阵列使用双色杂交,通常将感兴趣的样本与参考样品进行比较,这在某种程度上消除了探头效果。但是,该分辨率几乎足够高,以提供单点拷贝数估计。各种组拟议统计程序,即汇集来自邻近地点的数据以成功提高精度。然而,这些程序需要平均相对大的区域,从而大大降低分辨率。最近,已经提出了用于探测效果的回归型模型,并似乎提高了精度以及精度。在本文中,我们提出了一种专门用于单点估计的混合模型解决方案,其提供了与现有方法的各种优势。我们使用带有公共数据集的314个示例数据库,以激励和适合观察到的强度的调节模型给定等位基因特定的拷贝号。使用估计的模型,我们可以计算出提供有用的预测规则的后验概率以及每个呼叫的置信度。实现此过程的软件将在Biocuconductor Oligo封装中提供(http:// www.bioconductor.org)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号