首页> 外文OA文献 >ESTIMATING GENOME-WIDE COPY NUMBER USING ALLELE SPECIFIC MIXTURE MODELS
【2h】

ESTIMATING GENOME-WIDE COPY NUMBER USING ALLELE SPECIFIC MIXTURE MODELS

机译:使用等位基因特定的混合模型估算全基因组拷贝数

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Genomic changes such as copy number alterations are thought to be one of the major underlying causes of human phenotypic variation among normal and disease subjects [23,11,25,26,5,4,7,18]. These include chromosomal regions with so-called copy number alterations: instead of the expected two copies, a section of the chromosome for a particular individual may have zero copies (homozygous deletion), one copy (hemizygous deletions), or more than two copies (amplifications). The canonical example is Down syndrome which is caused by an extra copy of chromosome 21. Identification of such abnormalities in smaller regions has been of great interest, because it is believed to be an underlying cause of cancer.More than one decade ago comparative genomic hybridization (CGH)technology was developed to detect copy number changes in a high-throughput fashion. However, this technology only provides a 10 MB resolution which limits the ability to detect copy number alterations spanning small regions. It is widely believed that a copy number alteration as small as one base can have significant downstream effects, thus microarray manufacturers have developed technologies that provide much higher resolution. Unfortunately, strong probe effects and variation introduced by sample preparation procedures have made single-point copy number estimates too imprecise to be useful. CGH arrays use a two-color hybridization, usually comparing a sample of interest to a reference sample, which to some degree removes the probe effect. However, the resolution is not nearly high enough to provide single-point copy number estimates. Various groups have proposed statistical procedures that pool data from neighboring locations to successfully improve precision. However, these procedure need to average across relatively large regions to work effectively thus greatly reducing the resolution. Recently, regression-type models that account for probe-effect have been proposed and appear to improve accuracy as well as precision. In this paper, we propose a mixture model solution specifically designed for single-point estimation, that provides various advantages over the existing methodology. We use a 314 sample database, constructed with public datasets, to motivate and fit models for the conditional distribution of the observed intensities given allele specific copy numbers. With the estimated models in place we can compute posterior probabilities that provide a useful prediction rule as well as a confidence measure for each call. Software to implement this procedure will be available in the Bioconductor oligo packagehttp://www.bioconductor.org).
机译:基因组变化,例如拷贝数变化,被认为是正常人和疾病人之间人类表型变异的主要原因之一[23,11,25,26,5,4,7,18]。这些包括具有所谓的拷贝数改变的染色体区域:特定个体的一部分染色体可能具有零个拷贝(纯合缺失),一个拷贝(半合缺失)或多于两个拷贝(而不是预期的两个拷贝)。扩增)。典型的例子是唐氏综合症,它是由21号染色体的额外副本引起的。在较小区域中识别此类异常异常引起人们极大的兴趣,因为据信这是癌症的根本原因。十多年前,比较基因组杂交(CGH)技术的开发是为了以高通量方式检测拷贝数变化。但是,该技术仅提供10 MB的分辨率,这限制了检测跨小区域的副本数量变更的能力。人们普遍认为,一个碱基的拷贝数变化可能会对下游产生重大影响,因此,微阵列制造商已经开发出可提供更高分辨率的技术。不幸的是,样品制备程序引入了强烈的探针效应和变异性,使得单点拷贝数估计值太不精确而无法使用。 CGH阵列使用双色杂交,通常将目标样品与参考样品进行比较,从而在某种程度上消除了探针效应。但是,分辨率不足以提供单点拷贝数估计。各个小组都提出了统计程序,该统计程序将从相邻位置收集数据以成功提高精度。但是,这些过程需要在相对较大的区域进行平均才能有效地工作,从而大大降低了分辨率。近来,已经提出了解决探针效应的回归类型模型,并且看起来提高了准确性和精确度。在本文中,我们提出了一种专为单点估计而设计的混合模型解决方案,与现有方法相比具有多种优势。我们使用由公共数据集构建的314个样本数据库,根据给定的等位基因特定拷贝数,激发并拟合模型来观察强度的条件分布。有了估计的模型,我们可以计算后验概率,这些后验概率提供有用的预测规则以及每个调用的置信度。可在Bioconductor oligo软件包中找到实现此程序的软件(http://www.bioconductor.org)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号