首页> 外文期刊>Frontiers in Genetics >SRBreak: A Read-Depth and Split-Read Framework to Identify Breakpoints of Different Events Inside Simple Copy-Number Variable Regions
【24h】

SRBreak: A Read-Depth and Split-Read Framework to Identify Breakpoints of Different Events Inside Simple Copy-Number Variable Regions

机译:SRBreak:一个读取深度和拆分读取框架,用于识别简单拷贝数可变区域内不同事件的断点

获取原文
           

摘要

Copy-number variation (CNV) has been associated with increased risk of complex diseases. High-throughput sequencing (HTS) technologies facilitate the detection of copy-number variable regions (CNVRs) and their breakpoints. This helps in understanding genome structure as well as their evolution process. Various approaches have been proposed for detecting CNV breakpoints, but currently it is still challenging for tools based on a single analysis method to identify breakpoints of CNVs. It has been shown, however, that pipelines which integrate multiple approaches are able to report more reliable breakpoints. Here, based on HTS data, we have developed a pipeline to identify approximate breakpoints (±10 bp) relating to different ancestral events within a specific CNVR. The pipeline combines read-depth and split-read information to infer breakpoints, using information from multiple samples to allow an imputation approach to be taken. The main steps involve using a normal mixture model to cluster samples into different groups, followed by simple kernel-based approaches to maximize information obtained from read-depth and split-read approaches, after which common breakpoints of groups are inferred. The pipeline uses split-read information directly from CIGAR strings of BAM files, without using a re-alignment step. On simulated data sets, it was able to report breakpoints for very low-coverage samples including those for which only single-end reads were available. When applied to three loci from existing human resequencing data sets (NEGR1, LCE3, IRGM) the pipeline obtained good concordance with results from the 1000 Genomes Project (92, 100, and 82%, respectively). The package is available at https://github.com/hoangtn/SRBreak , and also as a docker-based application at https://registry.hub.docker.com/u/hoangtn/srbreak/ .
机译:拷贝数变异(CNV)与复杂疾病的风险增加相关。高通量测序(HTS)技术有助于检测拷贝数可变区(CNVR)及其断点。这有助于理解基因组结构及其进化过程。已经提出了用于检测CNV断点的各种方法,但是当前对于基于单一分析方法的工具来识别CNV的断点仍然是挑战。但是,已经表明,集成了多种方法的管道能够报告更可靠的断点。在这里,基于HTS数据,我们开发了一条管道来识别与特定CNVR中的不同祖先事件相关的近似断点(±10 bp)。流水线使用来自多个样本的信息将读取深度和拆分读取的信息组合起来以推断断点,从而允许采用插补方法。主要步骤包括使用正常的混合模型将样本聚类为不同的组,然后使用基于内核的简单方法来最大化从读取深度和拆分读取方法获得的信息,然后推断出组的常见断点。管道直接使用BAM文件的CIGAR字符串中的拆分读取信息,而无需使用重新对齐步骤。在模拟数据集上,它能够报告极低覆盖率的样本的断点,包括仅单端读取的样本。当将其应用于现有人类重测序数据集的三个基因座(NEGR1,LCE3,IRGM)时,该管线与1000个基因组计划的结果(分别为92%,100%和82%)获得了很好的一致性。该软件包可从https://github.com/hoangtn/SRBreak获得,也可以在https://registry.hub.docker.com/u/hoangtn/srbreak/上作为基于Docker的应用程序获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号