Motivation: Array CGH technologies enable the simultaneous measurement of DNA copy number for thousands of sites on a genome. We developed the circular binary segmentation (CBS) algorithm to divide the genome into regions of equal copy number (Olshen {it et~al}, 2004). The algorithm tests for change-points using a maximal $t$-statistic with a permutation reference distribution to obtain the corresponding $p$-value. The number of computations required for the maximal test statistic is $O(N^2),$ where $N$ is the number of markers. This makes the full permutation approach computationally prohibitive for the newer arrays that contain tens of thousands markers and highlights the need for a faster. algorithm.Results: We present a hybrid approach to obtain the $p$-value of the test statistic in linear time. We also introduce a rule for stopping early when there is strong evidence for the presence of a change. We show through simulations that the hybrid approach provides a substantial gain in speed with only a negligible loss in accuracy and that the stopping rule further increases speed. We also present the analysis of array CGH data from a breast cancer cell line to show the impact of the new approaches on the analysis of real data.Availability: An R (R Development Core Team, 2006) version of the CBS algorithm has been implemented in the ``DNAcopyu27u27 package of the Bioconductor project (Gentleman {it et~al}, 2004). The proposed hybrid method for the $p$-value is available in version 1.2.1 or higher and the stopping rule for declaring a change early is available in version 1.5.1 or higher.
展开▼
机译:动机:阵列CGH技术可同时测量基因组上数千个位点的DNA拷贝数。我们开发了循环二进制分割(CBS)算法,将基因组划分为相等拷贝数的区域(Olshen {it},2004年)。该算法使用带有排列参考分布的最大$ t $统计量测试变化点,以获得相应的$ p $值。最大检验统计量所需的计算次数为$ O(N ^ 2),$,其中$ N $是标记数。这使得完全置换方法在计算上无法用于包含成千上万个标记的较新数组,并强调了对更快速度的需求。结果:我们提出了一种混合方法,可以在线性时间内获得测试统计量的$ p $值。我们还引入了一条规则,即在有确凿证据表明存在变更时尽早停止。通过仿真我们可以看出,混合方法可显着提高速度,而精度损失可忽略不计,而停止规则可进一步提高速度。我们还对乳腺癌细胞系中的阵列CGH数据进行了分析,以显示新方法对真实数据分析的影响。可用性:已实施R(R Development Core Team,2006)版本的CBS算法在Bioconductor项目的``DNAcopy u27 u27软件包''中(Gentleman { it etal},2004)。建议的$ p $值混合方法在1.2.1版或更高版本中可用,而提前声明更改的停止规则在1.5.1版或更高版本中可用。
展开▼