首页> 外文期刊>BMC Medical Genomics >Hybrid de novo tandem repeat detection using short and long reads
【24h】

Hybrid de novo tandem repeat detection using short and long reads

机译:使用短读和长读的混合从头串联重复检测

获取原文
       

摘要

Background As one of the most studied genome rearrangements, tandem repeats have a considerable impact on genetic backgrounds of inherited diseases. Many methods designed for tandem repeat detection on reference sequences obtain high quality results. However, in the case of a de novo context, where no reference sequence is available, tandem repeat detection remains a difficult problem. The short reads obtained with the second-generation sequencing methods are not long enough to span regions that contain long repeats. This length limitation was tackled by the long reads obtained with the third-generation sequencing platforms such as Pacific Biosciences technologies. Nevertheless, the gain on the read length came with a significant increase of the error rate. The main objective of nowadays studies on long reads is to handle the high error rate up to 16%. Methods In this paper we present MixTaR, the first de novo method for tandem repeat detection that combines the high-quality of short reads and the large length of long reads. Our hybrid algorithm uses the set of short reads for tandem repeat pattern detection based on a de Bruijn graph. These patterns are then validated using the long reads, and the tandem repeat sequences are constructed using local greedy assemblies. Results MixTaR is tested with both simulated and real reads from complex organisms. For a complete analysis of its robustness to errors, we use short and long reads with different error rates. The results are then analysed in terms of number of tandem repeats detected and the length of their patterns. Conclusions Our method shows high precision and sensitivity. With low false positive rates even for highly erroneous reads, MixTaR is able to detect accurate tandem repeats with pattern lengths varying within a significant interval.
机译:背景技术作为研究最多的基因组重排之一,串联重复序列对遗传性疾病的遗传背景有相当大的影响。设计用于参考序列的串联重复检测的许多方法均能获得高质量的结果。但是,在没有参考序列可用的从头上下文的情况下,串联重复检测仍然是一个难题。使用第二代测序方法获得的短读片段不够长,无法跨越包含长重复序列的区域。通过使用第三代测序平台(例如Pacific Biosciences技术)获得的长读段解决了这种长度限制。然而,读取长度的增加伴随着错误率的显着增加。当今关于长读的研究的主要目的是处理高达16%的高错误率。方法在本文中,我们介绍了MixTaR,这是首个从头开始的串联重复检测方法,将短读段的高质量和长读段的长距离相结合。我们的混合算法使用短读集基于de Bruijn图进行串联重复模式检测。然后使用长读段验证这些模式,并使用局部贪婪程序集构建串联重复序列。结果MixTaR已通过复杂生物的模拟读数和真实读数进行测试。为了完整分析其对错误的鲁棒性,我们使用具有不同错误率的短读和长读。然后根据检测到的串联重复序列的数量及其模式的长度来分析结果。结论我们的方法显示出高精度和高灵敏度。即使对于高度错误的读取,误报率也很低,MixTaR能够检测出准确的串联重复序列,其中模式长度在相当大的间隔内变化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号