首页> 外文期刊>PLoS Computational Biology >Estimating repeat spectra and genome length from low-coverage genome skims with RESPECT
【24h】

Estimating repeat spectra and genome length from low-coverage genome skims with RESPECT

机译:从低覆盖基因组脱脂者估算重复光谱和基因组长度

获取原文
获取外文期刊封面目录资料

摘要

The cost of sequencing the genome is dropping at a much faster rate compared to assembling and finishing the genome. The use of lightly sampled genomes (genome-skims) could be transformative for genomic ecology, and results using k-mers have shown the advantage of this approach in identification and phylogenetic placement of eukaryotic species. Here, we revisit the basic question of estimating genomic parameters such as genome length, coverage, and repeat structure, focusing specifically on estimating the k-mer repeat spectrum. We show using a mix of theoretical and empirical analysis that there are fundamental limitations to estimating the k-mer spectra due to ill-conditioned systems, and that has implications for other genomic parameters. We get around this problem using a novel constrained optimization approach (Spline Linear Programming), where the constraints are learned empirically. On reads simulated at 1X coverage from 66 genomes, our method, REPeat SPECTra Estimation (RESPECT), had 2.2% error in length estimation compared to 27% error previously achieved. In shotgun sequenced read samples with contaminants, RESPECT length estimates had median error 4%, in contrast to other methods that had median error 80%. Together, the results suggest that low-pass genomic sequencing can yield reliable estimates of the length and repeat content of the genome. The RESPECT software will be publicly available at https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_shahab-2Dsarmashghi_RESPECT.git&d=DwIGAw&c=-35OiAkTchMrZOngvJPOeA&r=ZozViWvD1E8PorCkfwYKYQMVKFoEcqLFm4Tg49XnPcA&m=f-xS8GMHKckknkc7Xpp8FJYw_ltUwz5frOw1a5pJ81EpdTOK8xhbYmrN4ZxniM96&s=717o8hLR1JmHFpRPSWG6xdUQTikyUjicjkipjFsKG4w&e=.
机译:与组装和完成基因组相比,测序的成本以更快的速率滴加。使用轻微采样的基因组(基因组 - 脱脂)可以转化为基因组生态学,并且使用K-MERS的结果表明了这种方法在真核物种的鉴定和系统发育放置方面的优点。在这里,我们重新审视估计基因组参数的基本问题,例如基因组长度,覆盖率和重复结构,专注于估计K-MER重复谱。我们展示了一种理论和实证分析的混合,即由于条件不良系统估计K-MES光谱存在基本的限制,并且对其他基因组参数具有影响。我们使用新颖的约束优化方法(样条线性编程)来解决这个问题,在经验上学习约束。在从66个基因组的1x覆盖率模拟的读取中,我们的方法,重复光谱估计(尊重)的长度估计值为2.2%估计与先前实现的27%误差相比。在霰弹枪测序的读样品中,尊重长度估计值为4%,与具有中值80%的其他方法相比。结果表明,低通基因组测序可以产生可靠的基因组的长度和重复含量的可靠估计。尊重软件将在https://urldefense.prooppoint.com/v2/url?u=https-3a__github.com_shahab-2dsarmashghi_Respect.git & ;d=dwigaw&骤行; = -35oiaktchmrzongvjpoea& amp ;r=zozviwvd1e8porcqfwykyqmvkfoecqlfm4tg49xnpca & m=f -xs8gmhkckknkc7xpp8fjyw_ltuwz5frow1a5p81epdtok8xhbymrn4zxnim96& s = 717o8hlr1jmhfprpswg6xduqtikyujicjkipjfskg4w& e =。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号