首页> 外文期刊>BMC Genomics >Genome-wide transcription factor binding site/promoter databases for the analysis of gene sets and co-occurrence of transcription factor binding motifs
【24h】

Genome-wide transcription factor binding site/promoter databases for the analysis of gene sets and co-occurrence of transcription factor binding motifs

机译:全基因组转录因子结合位点/启动子数据库,用于分析基因组和转录因子结合基序的共现

获取原文
           

摘要

Background The use of global gene expression profiling is a well established approach to understand biological processes. One of the major goals of these investigations is to identify sets of genes with similar expression patterns. Such gene signatures may be very informative and reveal new aspects of particular biological processes. A logical and systematic next step is to reduce the identified gene signatures to the regulatory components that induce the relevant gene expression changes. A central issue in this context is to identify transcription factors, or transcription factor binding sites (TFBS), likely to be of importance for the expression of the gene signatures. Results We develop a strategy that efficiently produces TFBS/promoter databases based on user-defined criteria. The resulting databases constitute all genes in the Santa Cruz database and the positions for all TFBS provided by the user as position weight matrices. These databases are then used for two purposes, to identify significant TFBS in the promoters in sets of genes and to identify clusters of co-occurring TFBS. We use two criteria for significance, significantly enriched TFBS in terms of total number of binding sites for the promoters, and significantly present TFBS in terms of the fraction of promoters with binding sites. Significant TFBS are identified by a re-sampling procedure in which the query gene set is compared with typically 105 gene lists of similar size randomly drawn from the TFBS/promoter database. We apply this strategy to a large number of published ChIP-Chip data sets and show that the proposed approach faithfully reproduces ChIP-Chip results. The strategy also identifies relevant TFBS when analyzing gene signatures obtained from the MSigDB database. In addition, we show that several TFBS are highly correlated and that co-occurring TFBS define functionally related sets of genes. Conclusions The presented approach of promoter analysis faithfully reproduces the results from several ChIP-Chip and MigDB derived gene sets and hence may prove to be an important method in the analysis of gene signatures obtained through ChIP-Chip or global gene expression experiments. We show that TFBS are organized in clusters of co-occurring TFBS that together define highly coherent sets of genes.
机译:背景技术使用全局基因表达谱分析是一种了解生物学过程的成熟方法。这些研究的主要目的之一是鉴定具有相似表达模式的基因集。这样的基因签名可能非常有用,并揭示了特定生物学过程的新方面。合理而系统的下一步是将已识别的基因标记减少至诱导相关基因表达变化的调节成分。在这种情况下,一个中心问题是确定可能对基因签名的表达很重要的转录因子或转录因子结合位点(TFBS)。结果我们开发了一种策略,可以根据用户定义的标准有效地生成TFBS /发起人数据库。生成的数据库构成了Santa Cruz数据库中的所有基因,以及由用户提供的所有TFBS的位置作为位置权重矩阵。然后将这些数据库用于两个目的,以鉴定基因组启动子中的显着TFBS,并鉴定共存TFBS的簇。我们使用两个有意义的标准,就启动子的结合位点总数而言,显着富集的TFBS,就具有结合位点的启动子的比例而言,显着呈现TFBS。通过重新采样过程来识别重要的TFBS,在该过程中,将查询基因集与通常从TFBS /启动子数据库中随机抽取的10 5 个基因列表进行比较。我们将此策略应用于大量已发布的ChIP-Chip数据集,并表明所提出的方法能够忠实地重现ChIP-Chip的结果。在分析从MSigDB数据库获得的基因签名时,该策略还可以识别相关的TFBS。此外,我们显示了几个TFBS高度相关,并且同时出现的TFBS定义了功能上相关的基因集。结论提出的启动子分析方法可以忠实地再现多个ChIP-Chip和MigDB衍生基因集的结果,因此可能被证明是分析通过ChIP-Chip或全局基因表达实验获得的基因特征的重要方法。我们显示,TFBS在共同出现的TFBS簇中组织在一起,它们共同定义了高度连贯的基因集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号