...
首页> 外文期刊>Nucleic Acids Research >Error-pooling-based statistical methods for identifying novel temporal replication profiles of human chromosomes observed by DNA tiling arrays
【24h】

Error-pooling-based statistical methods for identifying novel temporal replication profiles of human chromosomes observed by DNA tiling arrays

机译:基于错误池的统计方法,用于识别通过DNA切片阵列观察到的人类染色体的新型时间复制谱

获取原文
获取原文并翻译 | 示例
           

摘要

Statistical analysis on tiling array data is extremely challenging due to the astronomically large number of sequence probes, high noise levels of individual probes and limited number of replicates in these data. To overcome these difficulties, we first developed statistical error estimation and weighted ANOVA modeling approaches to high- density tiling array data, especially the former based on an advanced error- pooling method to accurately obtain heterogeneous technical error of smallsample tiling array data. Based on these approaches, we analyzed the high- density tiling array data of the temporal replication patterns during cellcycle S phase of synchronized HeLa cells on human chromosomes 21 and 22. We found many novel temporal replication patterns, identifying about 26% of over 1 million tiling array sequence probes with significant differential replication during the four 2- h time periods of S phase. Among these differentially replicated probes, 126 941 sequence probes were matched to 417 known genes. The majority of these genes were found to be replicated within one or two consecutive time periods, while the others were replicated at two non- consecutive time periods. Also, coding regions found to be more differentially replicated in particular time periods than noncoding regions in the gene- poor chromosome 21 ( 25% differentially replicated among genic probes versus 18.6% among intergenic probes), while such a phenomenon was less prominent in gene- rich chromosome 22. A rigorous statistical testing for local proximity of differentially replicated genic and intergenic probes was performed to identify significant stretches of differentially replicated sequence regions. From this analysis, we found that adjacent genes were frequently replicated at different time periods, potentially implying the existence of quite dense replication origins. Evaluating the conditional probability significance of identified gene ontology terms on chromosomes 21 and 22, we detected some over- represented molecular functions and biological processes among these differentially replicated genes, such as the ones relevant to hydrolase, transferase and receptor- binding activities. Some of these results were confirmed showing 470% consistency with cDNA microarray data that were independently generated in parallel with the tiling arrays. Thus, our improved analysis approaches specifically designed for high- density tiling array data enabled us to reliably and sensitively identify many novel temporal replication patterns on human chromosomes.
机译:由于天文数量众多的序列探针,单个探针的高噪声水平以及这些数据中重复样本的数量有限,对切片阵列数据的统计分析极具挑战性。为了克服这些困难,我们首先开发了统计误差估计和加权ANOVA建模方法来处理高密度切片阵列数据,特别是前者基于先进的误差池方法来准确获取小样本切片阵列数据的异构技术误差。基于这些方法,我们分析了人类21号和22号染色体上同步化HeLa细胞的细胞周期S阶段的时间复制模式的高密度切片阵列数据。我们发现了许多新颖的时间复制模式,在超过一百万个中鉴定了约26%平铺阵列序列探针在S期的四个2小时时间内具有明显的差异复制。在这些差异复制探针中,有126 941个序列探针与417个已知基因匹配。发现这些基因中的大多数在一个或两个连续的时间段内复制,而其他基因在两个非连续的时间段内复制。同样,在特定的时间段内,发现编码区比基因欠缺的21号染色体中的非编码区差异复制更多(基因探针之间差异复制25%,基因间探针之间差异18.6%),而这种现象在基因探针中不那么明显。丰富的染色体22。对差异复制的基因和基因间探针的局部邻近性进行了严格的统计测试,以鉴定差异复制的序列区域的显着延伸。通过此分析,我们发现相邻基因经常在不同的时间段复制,这潜在地暗示了存在非常密集的复制起点。为了评估在21号和22号染色体上确定的基因本体术语的条件概率重要性,我们在这些差异复制的基因中检测到一些过分代表的分子功能和生物学过程,例如与水解酶,转移酶和受体结合活性有关的基因。这些结果中的一些已得到证实,与平行于平铺阵列独立产生的cDNA微阵列数据显示出470%的一致性。因此,我们针对高密度切片阵列数据而专门设计的改进分析方法使我们能够可靠,灵敏地识别人类染色体上许多新颖的时间复制模式。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号