...
首页> 外文期刊>BMC Genomics >ExUTR: a novel pipeline for large-scale prediction of 3′-UTR sequences from NGS data
【24h】

ExUTR: a novel pipeline for large-scale prediction of 3′-UTR sequences from NGS data

机译:ExUTR:从NGS数据大规模预测3'-UTR序列的新型流水线

获取原文
           

摘要

The three prime untranslated region (3′-UTR) is known to play a pivotal role in modulating gene expression by determining the fate of mRNA. Many crucial developmental events, such as mammalian spermatogenesis, tissue patterning, sex determination and neurogenesis, rely heavily on post-transcriptional regulation by the 3′-UTR. However, 3′-UTR biology seems to be a relatively untapped field, with only limited tools and 3′-UTR resources available. To elucidate the regulatory mechanisms of the 3′-UTR on gene expression, firstly the 3′-UTR sequences must be identified. Current 3′-UTR mining tools, such as GETUTR, 3USS and UTRscan, all depend on a well-annotated reference genome or curated 3′-UTR sequences, which hinders their application on a myriad of non-model organisms where the genomes are not available. To address these issues, the establishment of an NGS-based, automated pipeline is urgently needed for genome-wide 3′-UTR prediction in the absence of reference genomes. Here, we propose ExUTR, a novel NGS-based pipeline to predict and retrieve 3′-UTR sequences from RNA-Seq experiments, particularly designed for non-model species lacking well-annotated genomes. This pipeline integrates cutting-edge bioinformatics tools, databases (Uniprot and UTRdb) and novel in-house Perl scripts, implementing a fully automated workflow. By taking transcriptome assemblies as inputs, this pipeline identifies 3′-UTR signals based primarily on the intrinsic features of transcripts, and outputs predicted 3′-UTR candidates together with associated annotations. In addition, ExUTR only requires minimal computational resources, which facilitates its implementation on a standard desktop computer with reasonable runtime, making it affordable to use for most laboratories. We also demonstrate the functionality and extensibility of this pipeline using publically available RNA-Seq data from both model and non-model species, and further validate the accuracy of predicted 3′-UTR using both well-characterized 3′-UTR resources and 3P–Seq data. ExUTR is a practical and powerful workflow that enables rapid genome-wide 3′-UTR discovery from NGS data. The candidates predicted through this pipeline will further advance the study of miRNA target prediction, cis elements in 3′-UTR and the evolution and biology of 3′-UTRs. Being independent of a well-annotated reference genome will dramatically expand its application to much broader research area, encompassing all species for which RNA-Seq is available.
机译:已知三个主要的非翻译区(3'-UTR)通过确定mRNA的命运在调节基因表达中起关键作用。许多关键的发育事件,例如哺乳动物的精子发生,组织模式,性别确定和神经发生,都严重依赖于3'-UTR的转录后调控。但是,3'-UTR生物学似乎是一个相对尚未开发的领域,只有有限的工具和3'-UTR资源可用。为了阐明3'-UTR对基因表达的调控机制,首先必须鉴定3'-UTR序列。当前的3'-UTR挖掘工具(例如GETUTR,3USS和UTRscan)都依赖于标注正确的参考基因组或经过精心挑选的3'-UTR序列,这阻碍了它们在无基因组的无模型生物中的应用。可用。为了解决这些问题,在没有参考基因组的情况下,迫切需要建立基于NGS的自动化管道来进行全基因组3'-UTR预测。在这里,我们提出了ExUTR,这是一种基于NGS的新型流水线,可从RNA-Seq实验中预测和检索3'-UTR序列,特别是为缺少良好注释基因组的非模型物种设计的。该管道集成了最先进的生物信息学工具,数据库(Uniprot和UTRdb)以及新颖的内部Perl脚本,从而实现了完全自动化的工作流程。通过将转录组集合作为输入,该流水线主要基于转录本的固有特征来识别3'-UTR信号,并输出预测的3'-UTR候选对象和相关注释。另外,ExUTR只需要最少的计算资源,这有助于在合理的运行时间下在标准台式计算机上实现它,从而使大多数实验室都可以负担得起。我们还使用来自模型和非模型物种的公开可用RNA-Seq数据演示了该管道的功能和可扩展性,并使用特征明确的3'-UTR资源和3P–进一步验证了预测的3'-UTR的准确性序列数据。 ExUTR是一种实用而强大的工作流程,可从NGS数据快速发现全基因组3'-UTR。通过该途径预测的候选物将进一步推进miRNA靶标预测,3'-UTR中的顺式元件以及3'-UTR的进化和生物学的研究。独立于注释正确的参考基因组将极大地将其应用扩展到更广泛的研究领域,涵盖可使用RNA-Seq的所有物种。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号