...
首页> 外文期刊>BMC Genomics >Accurate inference of isoforms from multiple sample RNA-Seq data
【24h】

Accurate inference of isoforms from multiple sample RNA-Seq data

机译:从多个样本RNA-SEQ数据准确推理同种型

获取原文
           

摘要

Background RNA-Seq based transcriptome assembly has become a fundamental technique for studying expressed mRNAs ( i.e ., transcripts or isoforms) in a cell using high-throughput sequencing technologies, and is serving as a basis to analyze the structural and quantitative differences of expressed isoforms between samples. However, the current transcriptome assembly algorithms are not specifically designed to handle large amounts of errors that are inherent in real RNA-Seq datasets, especially those involving multiple samples, making downstream differential analysis applications difficult. On the other hand, multiple sample RNA-Seq datasets may provide more information than single sample datasets that can be utilized to improve the performance of transcriptome assembly and abundance estimation, but such information remains overlooked by the existing assembly tools. Results We formulate a computational framework of transcriptome assembly that is capable of handling noisy RNA-Seq reads and multiple sample RNA-Seq datasets efficiently. We show that finding an optimal solution under this framework is an NP-hard problem. Instead, we develop an efficient heuristic algorithm, called Iterative Shortest Path (ISP), based on linear programming (LP) and integer linear programming (ILP). Our preliminary experimental results on both simulated and real datasets and comparison with the existing assembly tools demonstrate that (i) the ISP algorithm is able to assemble transcriptomes with a greatly increased precision while keeping the same level of sensitivity, especially when many samples are involved, and (ii) its assembly results help improve downstream differential analysis. The source code of ISP is freely available at http://alumni.cs.ucr.edu/~liw/isp.html .
机译:背景技术基于RNA-SEQ的转录组件已经成为使用高通量测序技术研究细胞中的表达的MRNA(即,转录物或同种型)的基本技术,并作为分析表达同种型的结构和定量差异的基础样品之间。然而,目前的转录组件组装算法没有专门设计用于处理真正的RNA-SEQ数据集中固有的大量误差,尤其是涉及多个样本的误差,使下游差分分析应用难以实现。另一方面,多个样本RNA-SEQ数据集可以提供比单个样本数据集更多的信息,以改善转录组合组件和丰度估计的性能,但是这些信息仍然被现有的装配工具忽略。结果我们制定了转录组件的计算框架,其能够有效地处理噪声RNA-SEQ读取和多个样本RNA-SEQ数据集。我们展示了在此框架下找到最佳解决方案是一个NP难题。相反,我们开发了一种高效的启发式算法,称为迭代最短路径(ISP),基于线性编程(LP)和整数线性编程(ILP)。我们对模拟和实际数据集的初步实验结果以及与现有组装工具的比较表明(i)ISP算法能够在保持相同的敏感度的同时具有大大提高的经过大量提高的转录om,特别是当涉及许多样本时, (ii)其大会结果有助于提高下游差异分析。 ISP的源代码可在http://alumni.cs.ucr.edu/~liw/isp.html上自由使用。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号