...
首页> 外文期刊>BMC Bioinformatics >Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks
【24h】

Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks

机译:集成全基因组的异构全基因组数据集,用于推断全球监管网络

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Background The learning of global genetic regulatory networks from expression data is a severely under-constrained problem that is aided by reducing the dimensionality of the search space by means of clustering genes into putatively co-regulated groups, as opposed to those that are simply co-expressed . Be cause genes may be co-regulated only across a subset of all observed experimental conditions, biclustering (clustering of genes and conditions) is more appropriate than standard clustering. Co-regulated genes are also often functionally (physically, spatially, genetically, and/or evolutionarily) associated, and such a priori known or pre-computed associations can provide support for appropriately grouping genes. One important association is the presence of one or more common cis-regulatory motifs. In organisms where these motifs are not known, their de novo detection, integrated into the clustering algorithm, can help to guide the process towards more biologically parsimonious solutions. Results We have developed an algorithm, cMonkey, that detects putative co-regulated gene groupings by integrating the biclustering of gene expression data and various functional associations with the de novo detection of sequence motifs. Conclusion We have applied this procedure to the archaeon Halobacterium NRC-1, as part of our efforts to decipher its regulatory network. In addition, we used cMonkey on public data for three organisms in the other two domains of life: Helicobacter pylori, Saccharomyces cerevisiae , and Escherichia coli . The biclusters detected by cMonkey both recapitulated known biology and enabled novel predictions (some for Halobacterium were subsequently confirmed in the laboratory). For example, it identified the bacteriorhodopsin regulon, assigned additional genes to this regulon with apparently unrelated function, and detected its known promoter motif. We have performed a thorough comparison of cMonkey results against other clustering methods, and find that cMonkey biclusters are more parsimonious with all available evidence for co-regulation.
机译:背景技术从表达数据中学习全球遗传调控网络是一个严重不足的问题,这是通过将基因聚类为假定的共同调控的群体(而不是单纯地共同调控的群体)来减少搜索空间的维度而得到帮助的。表达 。由于可能仅在所有观察到的实验条件的子集中对基因进行共同调节,因此双聚类(基因和条件的聚类)比标准聚类更合适。共同调控的基因通常也在功能上(物理上,空间上,遗传上和/或进化上)相关,并且这样的先验已知或预先计算的关联可以为适当地分组基因提供支持。一个重要的关联是一种或多种常见的顺式调控基序的存在。在这些基序未知的生物中,它们的从头检测(已集成到聚类算法中)可以帮助指导该过程朝着生物学上更简单的解决方案发展。结果我们开发了一种算法cMonkey,该算法通过将基因表达数据的双聚类分析和各种功能关联与序列基序的从头检测相结合,来检测推定的共同调控的基因分组。结论我们已经将此方法应用于古细菌Halobacterium NRC-1,这是我们努力破解其监管网络的一部分。此外,我们在公共数据中使用了cMonkey来研究生活中其他两个领域中的三种生物:幽门螺杆菌,酿酒酵母和大肠埃希氏菌。通过cMonkey检测到的二聚体既概括了已知的生物学原理,又实现了新的预测(随后在实验室中证实了一些针对盐杆菌的预测)。例如,它鉴定了细菌视紫红质调节子,为该调节子分配了具有明显无关功能的基因,并检测了其已知的启动子基序。我们对cMonkey结果与其他聚类方法进行了彻底的比较,发现cMonkey的双聚类在所有可用的共调控证据上都更加简约。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号