首页> 外文会议>2011 IEEE 1st International Conference on Computational Advances in Bio and Medical Sciences >Workshop: Inferring viral population from ultra-deep sequencing data
【24h】

Workshop: Inferring viral population from ultra-deep sequencing data

机译:研讨会:从超深度测序数据推断病毒种群

获取原文

摘要

Since existing high-throughput sequencing systems are originally designed for a single genome assembly, they cannot distinguish and simultaneously assemble multiple closely related sequences as well as estimate their relative abundances. This paper presents a novel approach in ViSpA software for quasispecies spectrum reconstruction. On simulated data, ViSpA accurately reconstructs up to 29 (out of 44) quasispecies in absence of genotyping errors. The ViSpA was also applied to real read data derived from blood sample of HCV-infected patient processed by Roche 454 Life Science machine. The sequenced region is half-genome long. The method reconstructed 10 most frequent sequences each of which represents a viable protein. The most frequent sequence has been within 1% from the actual ORF obtained by cloning the quasispecies. ShoRAH was able to reconstruct only one sequence that represents a viable protein. This sequence has 99.94% similarity with the fourth most frequent assemblies. Both methods returned similar frequency estimations for this sequence: 0.017% (ShoRAH) and 0.019% (ViSpA). The remaining top 9 quasispecies reconstructed by ShoRAH contain multiple stop codons in their corresponding amino-acid sequences which is an indication of unfixed systematic erroneous indels introduced by 454 Life Sciences machines. Additional experiments on 90% of read data shows that the ten most frequent assembled quasispecies are robustly reproduced by the sequencing process in ViSpA.
机译:由于现有的高通量测序系统最初是为单个基因组装配设计的,因此它们无法区分并同时装配多个紧密相关的序列,也无法估计其相对丰度。本文提出了一种在ViSpA软件中用于准种谱重建的新方法。在模拟数据上,ViSpA可以在不存在基因分型错误的情况下准确地重建多达29种(44种)准种。 ViSpA还被应用于从Roche 454 Life Science机器处理的HCV感染患者血液样本中获得的真实读取数据。测序的区域是半基因组长。该方法重建了10个最常见的序列,每个序列代表一种可行的蛋白质。最频繁的序列与通过克隆准种获得的实际ORF相差1%以内。 ShoRAH只能重建代表一种有活力蛋白质的一个序列。此序列与第四最常见的程序集具有99.94%的相似性。两种方法针对此序列返回的频率估计相似:0.017%(ShoRAH)和0.019%(ViSpA)。 ShoRAH重建的其余前9个准种在其相应的氨基酸序列中包含多个终止密码子,这表明454 Life Sciences机器引入了未固定的系统错误插入/缺失。在90%的读取数据上进行的其他实验表明,通过ViSpA中的测序过程可以可靠地重现十个最常见的组装准种。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号