【24h】

Haplotype and Repeat Separation in Long Reads

机译:长时间阅读中的单倍型和重复分离

获取原文

摘要

Resolving the correct structure and succession of highly similar sequence stretches is one of the main open problems in genome assembly. For non haploid genomes this includes determining the sequences of the different haplotypes. For all but the smallest genomes it also involves separating different repeat instances. In this paper we discuss methods for resolving such problems in third generation long reads by classifying alignments between long reads according to whether they represent true or false read overlaps. The main problem in this context is the high error rate found in such reads, which greatly exceeds the variance between the similar regions we want to separate. Our methods can separate read classes stemming from regions with as little as 1% difference.
机译:解决高度相似的序列延伸的正确结构和顺序是基因组装配中的主要开放问题之一。对于非单倍体基因组,这包括确定不同单倍型的序列。除了最小的基因组以外,所有基因组还涉及分离不同的重复实例。在本文中,我们讨论了根据长读之间的比对来区分第三代长读中的此类问题的方法,这些对齐方式是根据长读代表的是真读重叠还是假读重叠来分类。在这种情况下,主要问题是在此类读取中发现的错误率很高,这大大超过了我们要分离的相似区域之间的差异。我们的方法可以将源于区域的阅读类分开,相差仅1%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号