首页> 美国卫生研究院文献>G3: GenesGenomesGenetics >Accurate Identification and Analysis of Human mRNA Isoforms Using Deep Long Read Sequencing
【2h】

Accurate Identification and Analysis of Human mRNA Isoforms Using Deep Long Read Sequencing

机译:使用深度长读测序技术准确鉴定和分析人类mRNA亚型

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Precise identification of RNA-coding regions and transcriptomes of eukaryotes is a significant problem in biology. Currently, eukaryote transcriptomes are analyzed using deep short-read sequencing experiments of complementary DNAs. The resulting short-reads are then aligned against a genome and annotated junctions to infer biological meaning. Here we use long-read complementary DNA datasets for the analysis of a eukaryotic transcriptome and generate two large datasets in the human K562 and HeLa S3 cell lines. Both data sets comprised at least 4 million reads and had median read lengths greater than 500 bp. We show that annotation-independent alignments of these reads provide partial gene structures that are very much in-line with annotated gene structures, 15% of which have not been obtained in a previous de novo analysis of short reads. For long-noncoding RNAs (i.e., lncRNA) genes, however, we find an increased fraction of novel gene structures among our alignments. Other important aspects of transcriptome analysis, such as the description of cell type-specific splicing, can be performed in an accurate, reliable and completely annotation-free manner, making it ideal for the analysis of transcriptomes of newly sequenced genomes. Furthermore, we demonstrate that long read sequence can be assembled into full-length transcripts with considerable success. Our method is applicable to all long read sequencing technologies.
机译:精确鉴定真核生物的RNA编码区和转录组是生物学中的重要问题。目前,使用互补DNA的深度短读测序实验分析了真核生物的转录组。然后将所得的短读段与基因组和带注释的接头对齐,以推断生物学意义。在这里,我们使用长时间阅读的互补DNA数据集来分析真核转录组,并在人K562和HeLa S3细胞系中生成两个大型数据集。这两个数据集至少包含400万次读取,中位读取长度大于500 bp。我们显示这些读取的注释独立比对提供了部分基因结构,该结构与注释的基因结构非常一致,其中有15%在以前的短读从头分析中尚未获得。但是,对于长非编码RNA(即lncRNA)基因,我们发现比对中新颖基因结构的比例增加。转录组分析的其他重要方面,例如细胞类型特异性剪接的描述,可以以准确,可靠和完全无注释的方式进行,非常适合分析新测序的基因组的转录组。此外,我们证明了长阅读序列可以被组装成全长转录本,并取得了相当大的成功。我们的方法适用于所有长读测序技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号