...
首页> 外文期刊>BMC Bioinformatics >Copy Number Variation detection from 1000 Genomes project exon capture sequencing data
【24h】

Copy Number Variation detection from 1000 Genomes project exon capture sequencing data

机译:从1000个基因组项目外显子捕获测序数据中检测拷贝数变异

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Background DNA capture technologies combined with high-throughput sequencing now enable cost-effective, deep-coverage, targeted sequencing of complete exomes. This is well suited for SNP discovery and genotyping. However there has been little attention devoted to Copy Number Variation (CNV) detection from exome capture datasets despite the potentially high impact of CNVs in exonic regions on protein function. Results As members of the 1000 Genomes Project analysis effort, we investigated 697 samples in which 931 genes were targeted and sampled with 454 or Illumina paired-end sequencing. We developed a rigorous Bayesian method to detect CNVs in the genes, based on read depth within target regions. Despite substantial variability in read coverage across samples and targeted exons, we were able to identify 107 heterozygous deletions in the dataset. The experimentally determined false discovery rate (FDR) of the cleanest dataset from the Wellcome Trust Sanger Institute is 12.5%. We were able to substantially improve the FDR in a subset of gene deletion candidates that were adjacent to another gene deletion call (17 calls). The estimated sensitivity of our call-set was 45%. Conclusions This study demonstrates that exonic sequencing datasets, collected both in population based and medical sequencing projects, will be a useful substrate for detecting genic CNV events, particularly deletions. Based on the number of events we found and the sensitivity of the methods in the present dataset, we estimate on average 16 genic heterozygous deletions per individual genome. Our power analysis informs ongoing and future projects about sequencing depth and uniformity of read coverage required for efficient detection.
机译:背景技术DNA捕获技术与高通量测序相结合,现在可以对整个外显子组进行经济有效的深度覆盖,靶向测序。这非常适合SNP发现和基因分型。然而,尽管外显子区域中的CNV对蛋白质功能的潜在巨大影响,但很少有人关注从外显子组捕获数据集中检测拷贝数变异(CNV)。结果作为1000个基因组计划分析工作的成员,我们调查了697个样本,其中931个基因被靶向,并通过454或Illumina配对末端测序进行了采样。我们基于目标区域内的读取深度,开发了一种严格的贝叶斯方法来检测基因中的CNV。尽管跨样本和目标外显子的阅读覆盖率存在很大差异,但我们仍能够在数据集中识别出107个杂合缺失。来自Wellcome Trust Sanger Institute的最干净数据集的实验确定的错误发现率(FDR)为12.5%。我们能够大幅改善与另一个基因删除调用(17个调用)相邻的基因删除候选集的FDR。我们的呼叫设置的估计灵敏度为45%。结论这项研究表明,在基于人群的测序和医学测序项目中收集的外显子测序数据集将是检测基因CNV事件(尤其是缺失)的有用底物。基于我们发现的事件数和当前数据集中方法的敏感性,我们估计每个个体基因组平均16个基因杂合缺失。我们的功率分析可为正在进行和未来的项目提供有关有效检测所需的测序深度和读取覆盖范围一致性的信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号