首页> 外文期刊>BMC Bioinformatics >Network motif-based identification of transcription factor-target gene relationships by integrating multi-source biological data
【24h】

Network motif-based identification of transcription factor-target gene relationships by integrating multi-source biological data

机译:通过整合多源生物数据,基于网络基序的转录因子-靶基因关系识别

获取原文
           

摘要

Background Integrating data from multiple global assays and curated databases is essential to understand the spatio-temporal interactions within cells. Different experiments measure cellular processes at various widths and depths, while databases contain biological information based on established facts or published data. Integrating these complementary datasets helps infer a mutually consistent transcriptional regulatory network (TRN) with strong similarity to the structure of the underlying genetic regulatory modules. Decomposing the TRN into a small set of recurring regulatory patterns, called network motifs (NM), facilitates the inference. Identifying NMs defined by specific transcription factors (TF) establishes the framework structure of a TRN and allows the inference of TF-target gene relationship. This paper introduces a computational framework for utilizing data from multiple sources to infer TF-target gene relationships on the basis of NMs. The data include time course gene expression profiles, genome-wide location analysis data, binding sequence data, and gene ontology (GO) information. Results The proposed computational framework was tested using gene expression data associated with cell cycle progression in yeast. Among 800 cell cycle related genes, 85 were identified as candidate TFs and classified into four previously defined NMs. The NMs for a subset of TFs are obtained from literature. Support vector machine (SVM) classifiers were used to estimate NMs for the remaining TFs. The potential downstream target genes for the TFs were clustered into 34 biologically significant groups. The relationships between TFs and potential target gene clusters were examined by training recurrent neural networks whose topologies mimic the NMs to which the TFs are classified. The identified relationships between TFs and gene clusters were evaluated using the following biological validation and statistical analyses: (1) Gene set enrichment analysis (GSEA) to evaluate the clustering results; (2) Leave-one-out cross-validation (LOOCV) to ensure that the SVM classifiers assign TFs to NM categories with high confidence; (3) Binding site enrichment analysis (BSEA) to determine enrichment of the gene clusters for the cognate binding sites of their predicted TFs; (4) Comparison with previously reported results in the literatures to confirm the inferred regulations. Conclusion The major contribution of this study is the development of a computational framework to assist the inference of TRN by integrating heterogeneous data from multiple sources and by decomposing a TRN into NM-based modules. The inference capability of the proposed framework is verified statistically ( e.g ., LOOCV) and biologically ( e.g ., GSEA, BSEA, and literature validation). The proposed framework is useful for inferring small NM-based modules of TF-target gene relationships that can serve as a basis for generating new testable hypotheses.
机译:背景整合来自多个全局测定和策展数据库的数据对于了解细胞内的时空相互作用至关重要。不同的实验测量各种宽度和深度的细胞过程,而数据库则包含基于既定事实或公开数据的生物学信息。整合这些互补的数据集有助于推断相互一致的转录调控网络(TRN),与基础遗传调控模块的结构非常相似。将TRN分解为一小组称为网络主题(NM)的周期性调节模式,有助于进行推断。鉴定由特定转录因子(TF)定义的NM可以建立TRN的框架结构,并可以推断TF-靶基因的关系。本文介绍了一种计算框架,该框架可利用来自多个来源的数据来基于NM推断TF-靶基因的关系。数据包括时程基因表达谱,全基因组位置分析数据,结合序列数据和基因本体(GO)信息。结果使用与酵母细胞周期进程相关的基因表达数据测试了拟议的计算框架。在800个与细胞周期相关的基因中,有85个被鉴定为候选TF,并被分为四个先前定义的NM。 TF子集的NM从文献中获得。支持向量机(SVM)分类器用于估计剩余TF的NM。 TFs的潜在下游靶基因被分为34个生物学上重要的组。通过训练循环神经网络来检查TF与潜在目标基因簇之间的关系,该网络的拓扑结构模仿了TF分类到的NM。使用以下生物学验证和统计分析来评估已确定的TF与基因簇之间的关系:(1)基因集富集分析(GSEA)以评估聚类结果; (2)留一法交叉验证(LOOCV),以确保SVM分类器以高置信度将TF分配给NM类; (3)结合位点富集分析(BSEA),以确定其预测TF的同源结合位点的基因簇富集; (4)与文献中先前报道的结果进行比较,以确认推断的法规。结论这项研究的主要贡献是开发了一个计算框架,通过整合来自多个来源的异构数据并将TRN分解为基于NM的模块来辅助TRN的推理。统计(例如LOOCV)和生物学(例如GSEA,BSEA和文献验证)验证了所提出框架的推理能力。提出的框架可用于推断TF目标基因关系的基于NM的小型模块,这些模块可作为生成新的可检验假设的基础。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号