首页> 外文期刊>BMC Systems Biology >TF-Cluster: A pipeline for identifying functionally coordinated transcription factors via network decomposition of the shared coexpression connectivity matrix (SCCM)
【24h】

TF-Cluster: A pipeline for identifying functionally coordinated transcription factors via network decomposition of the shared coexpression connectivity matrix (SCCM)

机译:TF-Cluster:通过共享共表达连接矩阵(SCCM)的网络分解来识别功能协调的转录因子的管道

获取原文
           

摘要

Background Identifying the key transcription factors (TFs) controlling a biological process is the first step toward a better understanding of underpinning regulatory mechanisms. However, due to the involvement of a large number of genes and complex interactions in gene regulatory networks, identifying TFs involved in a biological process remains particularly difficult. The challenges include: (1) Most eukaryotic genomes encode thousands of TFs, which are organized in gene families of various sizes and in many cases with poor sequence conservation, making it difficult to recognize TFs for a biological process; (2) Transcription usually involves several hundred genes that generate a combination of intrinsic noise from upstream signaling networks and lead to fluctuations in transcription; (3) A TF can function in different cell types or developmental stages. Currently, the methods available for identifying TFs involved in biological processes are still very scarce, and the development of novel, more powerful methods is desperately needed. Results We developed a computational pipeline called TF-Cluster for identifying functionally coordinated TFs in two steps: (1) Construction of a shared coexpression connectivity matrix (SCCM), in which each entry represents the number of shared coexpressed genes between two TFs. This sparse and symmetric matrix embodies a new concept of coexpression networks in which genes are associated in the context of other shared coexpressed genes; (2) Decomposition of the SCCM using a novel heuristic algorithm termed "Triple-Link", which searches the highest connectivity in the SCCM, and then uses two connected TF as a primer for growing a TF cluster with a number of linking criteria. We applied TF-Cluster to microarray data from human stem cells and Arabidopsis roots, and then demonstrated that many of the resulting TF clusters contain functionally coordinated TFs that, based on existing literature, accurately represent a biological process of interest. Conclusions TF-Cluster can be used to identify a set of TFs controlling a biological process of interest from gene expression data. Its high accuracy in recognizing true positive TFs involved in a biological process makes it extremely valuable in building core GRNs controlling a biological process. The pipeline implemented in Perl can be installed in various platforms.
机译:背景技术识别控制生物过程的关键转录因子(TF)是迈向更好理解基础调节机制的第一步。然而,由于大量基因的参与以及基因调控网络中复杂的相互作用,鉴定参与生物过程的TF仍然特别困难。面临的挑战包括:(1)大多数真核基因组编码成千上万个TF,这些TF组织在各种大小的基因家族中,并且在许多情况下序列保守性差,难以识别用于生物过程的TF; (2)转录通常涉及数百个基因,这些基因会产生来自上游信号网络的内在噪声的组合,并导致转录波动; (3)TF可以在不同的细胞类型或发育阶段起作用。目前,用于鉴定参与生物过程的TF的方法仍然非常稀少,并且迫切需要开发新颖的,更有效的方法。结果我们开发了一个称为TF-Cluster的计算管道,可通过两个步骤来识别功能协调的TF:(1)构建共享共表达连通性矩阵(SCCM),其中每个条目代表两个TF之间共享共表达基因的数量。稀疏且对称的矩阵体现了共表达网络的新概念,其中基因在其他共享共表达基因的背景下相互关联。 (2)使用一种称为“三重链接”的新颖启发式算法对SCCM进行分解,该算法搜索SCCM中的最高连接性,然后使用两个连接的TF作为引物来生长具有多个链接条件的TF簇。我们将TF-Cluster应用于来自人类干细胞和拟南芥根的微阵列数据,然后证明了许多所得的TF簇均包含功能协调的TF,根据现有文献,它们准确地代表了感兴趣的生物学过程。结论TF-Cluster可用于从基因表达数据中识别出一组控制感兴趣的生物过程的TF。它在识别生物过程中涉及的真正阳性TF方面具有很高的准确性,因此在构建控制生物过程的核心GRN方面非常有价值。 Perl中实现的管道可以安装在各种平台上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号