首页> 外文期刊>Nucleic Acids Research >Analysis of strand-specific RNA-seq data using machine learning reveals the structures of transcription units in Clostridium thermocellum
【24h】

Analysis of strand-specific RNA-seq data using machine learning reveals the structures of transcription units in Clostridium thermocellum

机译:使用机器学习对链特异性RNA-seq数据进行的分析揭示了热纤梭菌转录单位的结构

获取原文
获取原文并翻译 | 示例
           

摘要

Identification of transcription units (TUs) encoded in a bacterial genome is essential to elucidation of transcriptional regulation of the organism. To gain a detailed understanding of the dynamically composed TU structures, we have used four strand-specific RNA-seq (ssRNA-seq) datasets collected under two experimental conditions to derive the genomic TU organization of Clostridium thermocellum using a machine-learning approach. Our method accurately predicted the genomic boundaries of individual TUs based on two sets of parameters measuring the RNA-seq expression patterns across the genome: expression-level continuity and variance. A total of 2590 distinct TUs are predicted based on the four RNA-seq datasets. Among the predicted TUs, 44% have multiple genes. We assessed our prediction method on an independent set of RNA-seq data with longer reads. The evaluation confirmed the high quality of the predicted TUs. Functional enrichment analyses on a selected subset of the predicted TUs revealed interesting biology. To demonstrate the generality of the prediction method, we have also applied the method to RNA-seq data collected on Escherichia coli and achieved high prediction accuracies. The TU prediction program named SeqTU is publicly available at https://code.google.com/p/seqtu/. We expect that the predicted TUs can serve as the baseline information for studying transcriptional and post-transcriptional regulation in C. thermocellum and other bacteria.
机译:鉴定细菌基因组中编码的转录单位(TU)对于阐明生物体的转录调控至关重要。为了获得对动态组成的TU结构的详细了解,我们使用了在两个实验条件下收集的四个链特异性RNA-seq(ssRNA-seq)数据集,通过机器学习方法得出了热纤梭菌的基因组TU组织。我们的方法基于测量整个基因组中RNA-seq表达模式的两组参数准确预测了单个TU的基因组边界:表达水平的连续性和方差。根据四个RNA-seq数据集,总共预测了2590个不同的TU。在预测的TU中,有44%具有多个基因。我们用更长的读数对独立的RNA-seq数据集评估了我们的预测方法。评估确认了预测的TU的高质量。对预测的TU的选定子集进行的功能富集分析揭示了有趣的生物学。为了证明预测方法的通用性,我们还将该方法应用于在大肠杆菌中收集的RNA-seq数据,并获得了较高的预测精度。名为SeqTU的TU预测程序可从https://code.google.com/p/seqtu/公开获得。我们期望预测的TU可以作为研究热纤梭菌和其他细菌的转录和转录后调控的基础信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号