首页> 外文会议>IEEE International Conference on Bioinformatics and Biomedicine >Performance comparison and an ensemble approach of transcriptome assembly
【24h】

Performance comparison and an ensemble approach of transcriptome assembly

机译:绩效比较与转录机组装的集合方法

获取原文

摘要

Accurate transcriptome assembly using next-generation sequencing data is crucial in gene expression analysis. However, it has been observed that different assemblers generate significantly different outputs given the same RNA-Seq data. Even the same method often assembles different sets of transcripts when different sets of parameters are used. In this study, we performed comparative analysis of various transcriptome assemblers including four de novo and three genome-guided methods using simulated RNA-Seq data modeling Illumina Hi-Seq sequencing of Arabidopsis thaliana and Zea mays strain B73 transcriptomes. No assembler was able to reconstruct all of the reference transcripts correctly. A large number (~30%) of transcripts were not assembled correctly by any assembler. Furthermore, each assembler produced a different set of reference transcripts with very few that are common among all. While the de novo tools were able to assemble similar numbers of transcripts correctly as genome-guided tools for one dataset, the former methods also produced much larger numbers of incorrectly assembled transcripts compared to genome-guided tools. These results indicate that there remains a large room for transcriptome assembly to be improved. Therefore, we further investigated a consensus-based ensemble approach. By taking the consensus contig set shared, for example, among three or more de novo assemblers, 10% more transcripts were correctly identified for Arabidopsis thaliana datasets. While the incorrect to correct contig ratio for the de novo assemblers ranged from 4.9 (for Trinity) to 10.7 (SOAPdenovo), for the genome-guided methods the ratios were from 1.3 to 1.7. Using the consensus de novo method, we successfully reduced the ratio to the level very close to or even lower than those obtained by the genome-guided methods (1.5). The results of this study provides us a direction to build a better ensemble approach that can reconstruct all the correct transcripts.
机译:使用下一代测序数据的精确转录组件在基因表达分析中至关重要。然而,已经观察到不同的汇编器给出了相同的RNA-SEQ数据的显着不同的输出。甚至相同的方法通常在使用不同的参数时经常组装不同的成绩单。在这项研究中,我们对各种转录组合体进行了比较分析,包括使用拟南芥和Zeaa和Zea的模拟RNA-SEQ数据建模Illumina Hi-SEQ测序的Simulated RNA-SEQ数据建模的四种基因组引导方法。没有汇编程序能够正确重建所有参考记录。任何汇编器未正确组装大量(〜30 %)的转录物。此外,每个汇编器产生了不同的参考转录物,其中很少是常见的。虽然DE Novo工具能够正确地将类似的转录物组装为一个数据集的基因组导向工具,而前者的方法也与基因组导向工具相比产生了更大数量的错误组装的转录物。这些结果表明要改善转录组件的大型空间。因此,我们进一步调查了一种基于共识的集合方法。通过采取共享共享共享的共享,例如,在三个或更多的De Novo汇编中,对于拟南芥的数据集,可以正确识别10 %的成绩单。虽然对DE Novo汇编程序的校正比率的不正确范围为4.9(三位一体)至10.7(SOAPDENOVO),但对于基因组导向方法,比率为1.3至1.7。使用共识DE NOVO方法,我们成功将比率降低到非常接近或甚至低于通过基因组导向方法获得的水平(1.5)。本研究的结果为我们提供了一种建立更好的合成方法的方向,可以重建所有正确的成绩单。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号