首页> 外文会议>IEEE International Conference on Bioinformatics and Bioengineering >An Efficient Algorithm for Identifying Genomic Structural Inversion with Wide-spectrum of Length
【24h】

An Efficient Algorithm for Identifying Genomic Structural Inversion with Wide-spectrum of Length

机译:识别宽谱长的基因组结构反演的有效算法

获取原文

摘要

Genomic structural inversion is a class of structural variations, and has been widely associated to a series of complex traits and diseases. It has great significance in accurately identifying the inversions from the high-throughput sequencing data for both research and clinical practice. However, detecting inversion is a challenging computational problem. Existing approaches either limit to detect the inversions with specific length intervals or require a significant distribution of the coverage across the candidate interval. In this paper, we propose a novel detection algorithm to accurately identify the inversions with wide-spectrum of length. The proposed algorithm consists of two components: a clustering step and a segmentation and extension step. It first clusters the pair-ended reads to squeeze the candidate intervals. Then, it utilizes the contig assembly strategy to reconstruct the candidate intervals. Meanwhile, a segmentation and extension strategy is implemented. For each candidate interval, a feature vector is calculated, based on the characteristic values. Finally, the algorithm combines the comparison verification results to filter out some potential false positives, and then returns the inversion breakpoints on base-pair resolution. We conduct a series of simulation experiments to verify the performance of proposed algorithm and compare to two very popular approaches, DELLY and Pindel. The results demonstrate that the proposed approach provides better results on handling the inversions with wide-spectrum of length, especially when the inversions with short-to-medium length exist.
机译:基因组结构反转是一类结构变异,已广泛与一系列复杂的性状和疾病相关。对于准确地从高通量测序数据中识别反转序列,对于研究和临床实践都具有重要意义。但是,检测反演是一个具有挑战性的计算问题。现有方法要么限制以特定的长度间隔来检测反演,要么需要覆盖整个候选间隔的有效覆盖范围。在本文中,我们提出了一种新颖的检测算法,可以准确地识别具有广谱长度的反演。所提出的算法由两个部分组成:聚类步骤以及分段和扩展步骤。首先将成对的末端读段聚类,以压缩候选间隔。然后,它利用重叠群装配策略来重建候选区间。同时,实施了细分和扩展策略。对于每个候选间隔,基于特征值计算特征向量。最后,该算法结合比较验证结果以滤除一些潜在的误报,然后以碱基对分辨率返回反转断点。我们进行了一系列仿真实验,以验证所提出算法的性能,并与两种非常流行的方法DELLY和Pindel进行比较。结果表明,所提出的方法在处理宽谱长度的反演中提供了更好的结果,特别是当存在短到中等长度的反演时。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号