首页> 外文会议>IEEE International Conference on Bioinformatics and Biomedicine >Detecting chromosomal structural variation using jaccard distance and parallel architecture
【24h】

Detecting chromosomal structural variation using jaccard distance and parallel architecture

机译:使用Jaccard距离和并行架构检测染色体结构变异

获取原文

摘要

Understanding the nature of many diseases, including cancer, requires locating somatically acquired rearrangements corresponding to large-scale chromosomal aberrations. Computational methods to detect inter-chromosomal rearrangements based on next-generation sequencing platforms face the big challenge of accurately predicting the location of sites spanned by a typically small number of reads, while the entire sample contains hundreds of millions of reads. In this work, we propose a method called TDJD that identifies the location of interchromosomal breakpoints corresponding to a large scale structural variations, in particular translocations and insertions. To reduce the huge dimension of the search space, we split candidate reads that can be potential break points into windows, and represent the windows as a sequence of binary fingerprints. We then search for the location of the breakpoint in the reference genome using Jaccard distance. We use a combination of parallel computing, search using Jaccard distance to solve the exact nearest neighbor problem. The dimensionality reduction takes advantage of an SSE multi-thread architecture to achieve efficient search. We applied our algorithm to identify several reads with breakpoints, including those characterizing the PAX8-PPARγ rearrangement, a frequent modification occurring in follicular thyroid cancer. Our results show that we could identify the breakpoints much faster than the previous method. We also compared our results to several recently published methods, and found that our method is faster than all other compared methods with high accuracy.
机译:了解包括癌症在内的许多疾病的本质,需要找到与大规模染色体畸变相对应的体细胞获得性重排。基于下一代测序平台的检测染色体间重排的计算方法面临着巨大的挑战,即准确预测通常由少量读数组成的位点的位置,而整个样品中包含数亿个读数。在这项工作中,我们提出了一种称为TDJD的方法,该方法可以识别与大规模结构变异(特别是易位和插入)相对应的染色体间断点的位置。为了减少搜索空间的巨大空间,我们将可能是潜在断点的候选读物拆分为多个窗口,并将这些窗口表示为二进制指纹序列。然后,我们使用Jaccard距离搜索参考基因组中断点的位置。我们使用并行计算的组合,使用Jaccard距离进行搜索以解决确切的最近邻问题。降维利用SSE多线程体系结构来实现高效搜索。我们应用了我们的算法来识别多个具有断点的读数,包括表征PAX8-PPARγ重排的读数,PAX8-PPARγ重排是在滤泡性甲状腺癌中经常发生的修饰。我们的结果表明,我们可以比以前的方法更快地确定断点。我们还将我们的结果与最近发布的几种方法进行了比较,发现我们的方法比所有其他比较的方法都具有更高的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号