Performance comparison and an ensemble approach of transcriptome assembly

机译：绩效比较与转录机组装的集合方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Accurate transcriptome assembly using next-generation sequencing data is crucial in gene expression analysis. However, it has been observed that different assemblers generate significantly different outputs given the same RNA-Seq data. Even the same method often assembles different sets of transcripts when different sets of parameters are used. In this study, we performed comparative analysis of various transcriptome assemblers including four de novo and three genome-guided methods using simulated RNA-Seq data modeling Illumina Hi-Seq sequencing of Arabidopsis thaliana and Zea mays strain B73 transcriptomes. No assembler was able to reconstruct all of the reference transcripts correctly. A large number (~30%) of transcripts were not assembled correctly by any assembler. Furthermore, each assembler produced a different set of reference transcripts with very few that are common among all. While the de novo tools were able to assemble similar numbers of transcripts correctly as genome-guided tools for one dataset, the former methods also produced much larger numbers of incorrectly assembled transcripts compared to genome-guided tools. These results indicate that there remains a large room for transcriptome assembly to be improved. Therefore, we further investigated a consensus-based ensemble approach. By taking the consensus contig set shared, for example, among three or more de novo assemblers, 10% more transcripts were correctly identified for Arabidopsis thaliana datasets. While the incorrect to correct contig ratio for the de novo assemblers ranged from 4.9 (for Trinity) to 10.7 (SOAPdenovo), for the genome-guided methods the ratios were from 1.3 to 1.7. Using the consensus de novo method, we successfully reduced the ratio to the level very close to or even lower than those obtained by the genome-guided methods (1.5). The results of this study provides us a direction to build a better ensemble approach that can reconstruct all the correct transcripts.

机译：使用下一代测序数据的精确转录组件在基因表达分析中至关重要。然而，已经观察到不同的汇编器给出了相同的RNA-SEQ数据的显着不同的输出。甚至相同的方法通常在使用不同的参数时经常组装不同的成绩单。在这项研究中，我们对各种转录组合体进行了比较分析，包括使用拟南芥和Zeaa和Zea的模拟RNA-SEQ数据建模Illumina Hi-SEQ测序的Simulated RNA-SEQ数据建模的四种基因组引导方法。没有汇编程序能够正确重建所有参考记录。任何汇编器未正确组装大量（〜30 ％）的转录物。此外，每个汇编器产生了不同的参考转录物，其中很少是常见的。虽然DE Novo工具能够正确地将类似的转录物组装为一个数据集的基因组导向工具，而前者的方法也与基因组导向工具相比产生了更大数量的错误组装的转录物。这些结果表明要改善转录组件的大型空间。因此，我们进一步调查了一种基于共识的集合方法。通过采取共享共享共享的共享，例如，在三个或更多的De Novo汇编中，对于拟南芥的数据集，可以正确识别10 ％的成绩单。虽然对DE Novo汇编程序的校正比率的不正确范围为4.9（三位一体）至10.7（SOAPDENOVO），但对于基因组导向方法，比率为1.3至1.7。使用共识DE NOVO方法，我们成功将比率降低到非常接近或甚至低于通过基因组导向方法获得的水平（1.5）。本研究的结果为我们提供了一种建立更好的合成方法的方向，可以重建所有正确的成绩单。

著录项

来源
《IEEE International Conference on Bioinformatics and Biomedicine》|2017年|769p|共3页
会议地点
作者
Sairam Behera; Adam Voshall; Jitender S. Deogun; Etsuko N. Moriyama;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 Q81-53;
关键词
Bioinformatics; Genomics; Tools; Sequential analysis; Technological innovation; Data models; Computer science;

机译：生物信息学;基因组学;工具;顺序分析;技术创新;数据模型;计算机科学;

相似文献

外文文献
中文文献
专利

1. Short read Illumina data for the de novo assembly of a non-model snail species transcriptome (Radix balthica, Basommatophora, Pulmonata), and a comparison of assembler performance [J] . Barbara Feldmeyer, Christopher W Wheat, Nicolas Krezdorn, BMC Genomics . 2011,第1期

机译：简短阅读非模型蜗牛物种转录组（基数Balthica，Basommatophora，Pulmonata）从头组装的Illumina数据，并比较组装性能
2. EnClaSC: a novel ensemble approach for accurate and robust cell-type classification of single-cell transcriptomes [J] . Xiaoyang Chen, Shengquan Chen, Rui Jiang BMC Bioinformatics . 2020,第S13期

机译：Francasc：一种用于单细胞转录om的精确和强大的细胞型分类的新型集合方法
3. A comparison of the performance of the 3-D super-ensemble and an ensemble Kalman filter for short-range regional ocean prediction [J] . Baptiste Mourre, Jacopo Chiggiato Tellus, Series A. Dynamic meteorology & oceanography . 2014,第530期

机译：3-D超集合和集合卡尔曼滤波器在短距离区域海洋预测中的性能比较
4. Performance comparison and an ensemble approach of transcriptome assembly [C] . Sairam Behera, Adam Voshall, Jitender S. Deogun, IEEE International Conference on Bioinformatics and Biomedicine . 2017

机译：性能比较和转录组组装的整体方法
5. Comparison of the performance of gifted students identified through either the psychometric approach or the multiple criteria approach. [D] . Stephens, Virginia Calhoun. 2009

机译：通过心理测验方法或多标准方法确定的资优学生的表现比较。
6. Short read Illumina data for the de novo assembly of a non-model snail species transcriptome (Radix balthica Basommatophora Pulmonata) and a comparison of assembler performance [O] . Barbara Feldmeyer, Christopher W Wheat, Nicolas Krezdorn, 2011

机译：简短阅读非模型蜗牛物种转录组（基数bal藜BasommatophoraPulmonata）从头组装的Illumina数据并比较组装机性能
7. A consensus approach to vertebrate de novo transcriptome assembly from RNA-seq data: assembly of the duck (Anas platyrhynchos) transcriptome [O] . Moreton Joanna, Dunham Stephen P., Emes Richard D. 2014

机译：RNA-seq数据对脊椎动物从头转录组组装的共识方法：鸭（Anas platyrhynchos）转录组的组装

Performance comparison and an ensemble approach of transcriptome assembly

摘要

著录项

相似文献

相关主题

期刊订阅