首页> 外文会议>IEEE International Conference on Bioinformatics and Biomedicine >Performance comparison and an ensemble approach of transcriptome assembly
【24h】

Performance comparison and an ensemble approach of transcriptome assembly

机译:性能比较和转录组组装的整体方法

获取原文

摘要

Accurate transcriptome assembly using next-generation sequencing data is crucial in gene expression analysis. However, it has been observed that different assemblers generate significantly different outputs given the same RNA-Seq data. Even the same method often assembles different sets of transcripts when different sets of parameters are used. In this study, we performed comparative analysis of various transcriptome assemblers including four de novo and three genome-guided methods using simulated RNA-Seq data modeling Illumina Hi-Seq sequencing of Arabidopsis thaliana and Zea mays strain B73 transcriptomes. No assembler was able to reconstruct all of the reference transcripts correctly. A large number (~30%) of transcripts were not assembled correctly by any assembler. Furthermore, each assembler produced a different set of reference transcripts with very few that are common among all. While the de novo tools were able to assemble similar numbers of transcripts correctly as genome-guided tools for one dataset, the former methods also produced much larger numbers of incorrectly assembled transcripts compared to genome-guided tools. These results indicate that there remains a large room for transcriptome assembly to be improved. Therefore, we further investigated a consensus-based ensemble approach. By taking the consensus contig set shared, for example, among three or more de novo assemblers, 10% more transcripts were correctly identified for Arabidopsis thaliana datasets. While the incorrect to correct contig ratio for the de novo assemblers ranged from 4.9 (for Trinity) to 10.7 (SOAPdenovo), for the genome-guided methods the ratios were from 1.3 to 1.7. Using the consensus de novo method, we successfully reduced the ratio to the level very close to or even lower than those obtained by the genome-guided methods (1.5). The results of this study provides us a direction to build a better ensemble approach that can reconstruct all the correct transcripts.
机译:使用下一代测序数据进行准确的转录组组装在基因表达分析中至关重要。但是,已经观察到,给定相同的RNA-Seq数据,不同的汇编器会产生明显不同的输出。当使用不同的参数集时,即使是相同的方法也常常会组装不同的脚本集。在这项研究中,我们使用拟南芥和玉米菌株B73转录组的模拟RNA-Seq数据模拟Illumina Hi-Seq测序,对各种转录组组装者进行了比较分析,包括四种从头和三种基因组指导的方法。没有汇编程序能够正确地重建所有参考成绩单。大量(〜30 \%)的成绩单没有被任何汇编程序正确汇编。此外,每个汇编器都生成了一组不同的参考成绩单,其中很少共有。尽管从头工具可以正确地组装与一个数据集的基因组引导工具相似数量的转录本,但与基因组引导工具相比,前一种方法还产生了大量错误组装的转录本。这些结果表明,转录组装配仍有很大的改进空间。因此,我们进一步研究了基于共识的集成方法。例如,通过在三个或更多的从头汇编者中共享共有重叠群,可以正确鉴定拟南芥数据集的转录本多出10%。尽管从头组装者的正确正确的重叠群比率范围从4.9(对于Trinity)到10.7(SOAPdenovo),但是对于基因组指导方法,比率从1.3到1.7。使用共有从头方法,我们成功地将该比例降低到非常接近或什至低于通过基因组指导方法获得的比例(1.5)。这项研究的结果为我们提供了一个方向,以建立一个更好的整体方法,可以重建所有正确的成绩单。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号