...
首页> 外文期刊>PLoS Computational Biology >Detecting Statistically Significant Common Insertion Sites in Retroviral Insertional Mutagenesis Screens
【24h】

Detecting Statistically Significant Common Insertion Sites in Retroviral Insertional Mutagenesis Screens

机译:在逆转录病毒插入诱变筛选中检测统计上重要的常见插入位点

获取原文
           

摘要

Retroviral insertional mutagenesis screens, which identify genes involved in tumor development in mice, have yielded a substantial number of retroviral integration sites, and this number is expected to grow substantially due to the introduction of high-throughput screening techniques. The data of various retroviral insertional mutagenesis screens are compiled in the publicly available Retroviral Tagged Cancer Gene Database (RTCGD). Integrally analyzing these screens for the presence of common insertion sites (CISs, i.e., regions in the genome that have been hit by viral insertions in multiple independent tumors significantly more than expected by chance) requires an approach that corrects for the increased probability of finding false CISs as the amount of available data increases. Moreover, significance estimates of CISs should be established taking into account both the noise, arising from the random nature of the insertion process, as well as the bias, stemming from preferential insertion sites present in the genome and the data retrieval methodology. We introduce a framework, the kernel convolution (KC) framework, to find CISs in a noisy and biased environment using a predefined significance level while controlling the family-wise error (FWE) (the probability of detecting false CISs). Where previous methods use one, two, or three predetermined fixed scales, our method is capable of operating at any biologically relevant scale. This creates the possibility to analyze the CISs in a scale space by varying the width of the CISs, providing new insights in the behavior of CISs across multiple scales. Our method also features the possibility of including models for background bias. Using simulated data, we evaluate the KC framework using three kernel functions, the Gaussian, triangular, and rectangular kernel function. We applied the Gaussian KC to the data from the combined set of screens in the RTCGD and found that 53% of the CISs do not reach the significance threshold in this combined setting. Still, with the FWE under control, application of our method resulted in the discovery of eight novel CISs, which each have a probability less than 5% of being false detections.
机译:逆转录病毒插入诱变筛选可以鉴定出与小鼠肿瘤发展有关的基因,已经产生了大量的逆转录病毒整合位点,由于引入了高通量筛选技术,预计这一数目将大大增加。各种逆转录病毒插入诱变筛选的数据汇编在可公开获得的逆转录病毒标记癌基因数据库(RTCGD)中。整体分析这些筛查中是否存在常见的插入位点(CIS,即基因组中被多个独立肿瘤中的病毒插入击中的区域明显多于偶然性所期望的区域),需要一种方法来纠正发现错误的可能性增加的方法。 CIS随着可用数据量的增加而增加。此外,应该考虑到由于插入过程的随机性产生的噪声以及由于基因组中存在的优先插入位点和数据检索方法而产生的偏差,来确定CIS的显着性估计。我们引入了一个框架,即内核卷积(KC)框架,以使用预定义的显着性水平在嘈杂和有偏见的环境中查找CIS,同时控制族错误(FWE)(检测到错误CIS的可能性)。在以前的方法使用一个,两个或三个预定的固定比例的情况下,我们的方法能够以任何生物学相关的比例进行操作。这样就可以通过改变CIS的宽度来分析尺度空间中的CIS,从而为跨多个尺度的CIS行为提供新的见解。我们的方法还具有包括背景偏差模型的可能性。使用模拟数据,我们使用三个内核函数(高斯,三角和矩形内核函数)评估KC框架。我们将高斯KC应用于RTCGD中组合屏幕集合的数据,发现53%的CIS在此组合设置中未达到显着性阈值。仍然,在FWE的控制下,我们方法的应用导致发现了八个新颖的CIS,每个CIS的错误概率均不到5%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号