首页> 美国卫生研究院文献>Genes >Probably Correct: Rescuing Repeats with Short and Long Reads
【2h】

Probably Correct: Rescuing Repeats with Short and Long Reads

机译:可能是正确的:用短而长的读取救援重复

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Ever since the introduction of high-throughput sequencing following the human genome project, assembling short reads into a reference of sufficient quality posed a significant problem as a large portion of the human genome—estimated 50–69%—is repetitive. As a result, a sizable proportion of sequencing reads is multi-mapping, i.e., without a unique placement in the genome. The two key parameters for whether or not a read is multi-mapping are the read length and genome complexity. Long reads are now able to span difficult, heterochromatic regions, including full centromeres, and characterize chromosomes from “telomere to telomere”. Moreover, identical reads or repeat arrays can be differentiated based on their epigenetic marks, such as methylation patterns, aiding in the assembly process. This is despite the fact that long reads still contain a modest percentage of sequencing errors, disorienting the aligners and assemblers both in accuracy and speed. Here, I review the proposed and implemented solutions to the repeat resolution and the multi-mapping read problem, as well as the downstream consequences of reference choice, repeat masking, and proper representation of sex chromosomes. I also consider the forthcoming challenges and solutions with regards to long reads, where we expect the shift from the problem of repeat localization within a single individual to the problem of repeat positioning within pangenomes.
机译:自从人类基因组项目后引入高通量测序以来,组装简短读成足够的质量的参考构成了一个显着的问题,作为人类基因组的大部分 - 估计50-69%-IS重复。结果,可大量的测序读取比例是多映射,即,没有基因组中的独特放置。用于多映射的两个关键参数是读取长度和基因组复杂性。长读现在能够跨越难度,异色区域,包括完全焦化的区域,并从“端粒到端粒”表征染色体。此外,可以基于它们的表观遗传标记(例如甲基化图案)来差异化相同的读取或重复阵列,辅助组装过程。尽管长期读取仍然包含较大的测序误差百分比,但讨论精度和速度的对准器和汇编器。在这里,我向重复分辨率和多映射读取问题的提议和实施的解决方案以及参考选择的下游后果,重复筛选和性染色体的正确表示。我还考虑了长期读取的即将到来的挑战和解决方案,在那里我们预计将在单个个人内重复定位的问题转变为植物中重复定位的问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号