...
首页> 外文期刊>Scientific reports. >lncScore: alignment-free identification of long noncoding RNA from assembled novel transcripts
【24h】

lncScore: alignment-free identification of long noncoding RNA from assembled novel transcripts

机译:lncScore:从组装的新转录本中对长非编码RNA的无比对鉴定

获取原文
           

摘要

RNA-Seq based transcriptome assembly has been widely used to identify novel lncRNAs. However, the best-performing transcript reconstruction methods merely identified 21% of full-length protein-coding transcripts from H. sapiens. Those partial-length protein-coding transcripts are more likely to be classified as lncRNAs due to their incomplete CDS, leading to higher false positive rate for lncRNA identification. Furthermore, potential sequencing or assembly error that gain or abolish stop codons also complicates ORF-based prediction of lncRNAs. Therefore, it remains a challenge to identify lncRNAs from the assembled transcripts, particularly the partial-length ones. Here, we present a novel alignment-free tool, lncScore, which uses a logistic regression model with 11 carefully selected features. Compared to other state-of-the-art alignment-free tools (e.g. CPAT, CNCI, and PLEK), lncScore outperforms them on accurately distinguishing lncRNAs from mRNAs, especially partial-length mRNAs in the human and mouse datasets. In addition, lncScore also performed well on transcripts from five other species (Zebrafish, Fly, C. elegans, Rat, and Sheep). To speed up the prediction, multithreading is implemented within lncScore, and it only took 2?minute to classify 64,756 transcripts and 54?seconds to train a new model with 21,000 transcripts with 12 threads, which is much faster than other tools. lncScore is available at https://github.com/WGLab/lncScore.
机译:基于RNA-Seq的转录组组装已广泛用于鉴定新型lncRNA。然而,表现最佳的转录本重建方法仅鉴定了来自智人的全长蛋白质编码转录本的21%。这些部分长度的蛋白质编码转录本由于其CDS不完整而更有可能被分类为lncRNA,从而导致lncRNA鉴定的假阳性率更高。此外,获得或废除终止密码子的潜在测序或组装错误也使基于lnfRNA的基于ORF的预测复杂化。因此,从组装的转录本,尤其是部分长度的转录本中鉴定lncRNA仍然是一个挑战。在这里,我们介绍了一种新颖的免对齐工具lncScore,该工具使用了具有11种精心选择的功能的逻辑回归模型。与其他最新的免对齐工具(例如CPAT,CNCI和PLEK)相比,lncScore在准确区分lncRNA和mRNA尤其是人和小鼠数据集中的部分长度mRNA方面表现优于它们。此外,lncScore在其他五个物种(斑马鱼,蝇,秀丽隐杆线虫,大鼠和绵羊)的转录本上也表现良好。为了加快预测速度,在lncScore中实现了多线程,只花了2分钟即可对64,756个笔录进行分类,而用了54秒的时间来训练具有12个线程的21,000个笔录的新模型,这比其他工具要快得多。 lncScore可从https://github.com/WGLab/lncScore获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号