首页> 美国卫生研究院文献>Proceedings of the National Academy of Sciences of the United States of America >Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation
【2h】

Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation

机译:下一代mRNA测序(RNA-Seq)数据的稀疏线性建模用于异构体发现和丰度估算

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Since the inception of next-generation mRNA sequencing (RNA-Seq) technology, various attempts have been made to utilize RNA-Seq data in assembling full-length mRNA isoforms de novo and estimating abundance of isoforms. However, for genes with more than a few exons, the problem tends to be challenging and often involves identifiability issues in statistical modeling. We have developed a statistical method called “sparse linear modeling of RNA-Seq data for isoform discovery and abundance estimation” (SLIDE) that takes exon boundaries and RNA-Seq data as input to discern the set of mRNA isoforms that are most likely to present in an RNA-Seq sample. SLIDE is based on a linear model with a design matrix that models the sampling probability of RNA-Seq reads from different mRNA isoforms. To tackle the model unidentifiability issue, SLIDE uses a modified Lasso procedure for parameter estimation. Compared with deterministic isoform assembly algorithms (e.g., Cufflinks), SLIDE considers the stochastic aspects of RNA-Seq reads in exons from different isoforms and thus has increased power in detecting more novel isoforms. Another advantage of SLIDE is its flexibility of incorporating other transcriptomic data such as RACE, CAGE, and EST into its model to further increase isoform discovery accuracy. SLIDE can also work downstream of other RNA-Seq assembly algorithms to integrate newly discovered genes and exons. Besides isoform discovery, SLIDE sequentially uses the same linear model to estimate the abundance of discovered isoforms. Simulation and real data studies show that SLIDE performs as well as or better than major competitors in both isoform discovery and abundance estimation. The SLIDE software package is available at .
机译:自从下一代mRNA测序(RNA-Seq)技术问世以来,人们进行了各种尝试来利用RNA-Seq数据从头开始组装全长mRNA同工型并估算同工型的丰度。然而,对于具有多个外显子的基因而言,该问题往往具有挑战性,并且通常涉及统计建模中的可识别性问题。我们已经开发出一种统计方法,称为“稀疏线性建模的RNA-Seq数据,用于亚型发现和丰度估计”(SLIDE),该方法将外显子边界和RNA-Seq数据作为输入,以识别最可能出现的mRNA亚型集在RNA-Seq样品中。 SLIDE基于带有设计矩阵的线性模型,该模型对来自不同mRNA亚型的RNA-Seq读数的采样概率进行建模。为了解决模型的不可识别性问题,SLIDE使用改进的套索程序进行参数估计。与确定性同工型组装算法(例如,袖扣)相比,SLIDE考虑了来自不同同工型的外显子中RNA-Seq读数的随机性,因此具有检测更多新型同工型的能力。 SLIDE的另一个优势是它可以灵活地将其他转录组数据(例如RACE,CAGE和EST)纳入其模型,以进一步提高异构体发现的准确性。 SLIDE还可以在其他RNA-Seq组装算法的下游进行工作,以整合新发现的基因和外显子。除了发现同工型以外,SLIDE依次使用相同的线性模型来估计发现的同工型的丰度。仿真和实际数据研究表明,SLIDE在异构体发现和丰度估计方面的性能均优于或优于主要竞争对手。可以从下载SLIDE软件包。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号