首页> 外文会议>International Conference on Intelligent Systems for Molecular Biology >Mining for putative regulatory elements in the yeast genome using gene expression data
【24h】

Mining for putative regulatory elements in the yeast genome using gene expression data

机译:使用基因表达数据挖掘酵母基因组中的推定调节元件

获取原文
获取外文期刊封面目录资料

摘要

We have developed a set of methods and tools for automatic discovery of putative regulatory signals in genome sequences. The analysis pipeline consists of gene expression data clustering, sequence pattern discovery from upstream sequences of genes, a control experiment for pattern significance threshold limit detection, selection of interesting patterns, grouping of these patterns, representing the pattern groups in a concise form and evaluating the discovered putative signals against existing databases of regulatory signals. The pattern discovery is computationally the most expensive and crucial step. Our tool performs a rapid exhaustive search for a priori unknown statistically significant sequence patterns of unrestricted length. The statistical significance is determined for a set of sequences in each cluster with respect to a set of back-ground sequences allowing the detection of subtle regulatory signals specific for each cluster. The potentially large number of significant patterns is reduced to a small number of groups by clustering them by mutual similarity. Automatically derived consensus patterns of these groups represent the results in a comprehensive way for a human investigator. We have performed a systematic analysis for the yeast Saccharomyces cerevisiae. We created a large number of independent clusterings of expression data simultaneously assessing the "goodness" of each cluster. For each of the over 52000 clusters acquired in this way we discovered significant patterns in the upstream sequences of respective genes. We selected nearly 1500 significant patterns by formal criteria and matched them against the experimentally mapped transcription factor binding sites in the SCPD database. We clustered the 1500 patterns to 62 groups for which we derived automatically alignments and consensus patterns. Of these 62 groups 48 had patterns that have matching sites in SCPD database.
机译:我们开发了一套方法和工具,用于在基因组序列中自动发现推定的调节信号。分析管线由基因表达数据聚类,从基因上游序列发现的序列模式发现,用于模式显着性阈值限制检测的控制实验,选择有趣的模式,对这些模式的分组,以简洁的形式代表模式组,并评估发现了针对现有的监管信号数据库的推定信号。模式发现是计算上最昂贵和最重要的步骤。我们的工具对先验未知的统计学上显着的序列模式进行了快速详尽的搜索,其不受限制的长度。对于一组背面序列,确定每个群集中的一组序列确定统计显着性,允许检测每个群集特定的微妙调节信号。通过相互相似性聚类它们,通过聚类它们将潜在大量的显着模式降低到少量组。自动导出的这些组的共识模式以人类调查员全面地代表结果。我们对酵母酿酒酵母进行了系统分析。我们创建了大量的表达数据群集,同时评估每个群集的“善良”。对于以这种方式获得的超过52000个簇中的每一个,我们发现了各种基因的上游序列中的显着模式。我们通过正式标准选择了近1500个重要的模式,并将它们与SCPD数据库中的实验映射转录因子绑定站相匹配。我们将1500个模式群集为62个组,我们派生自动对齐和共识模式。在这62组中,48有具有SCPD数据库中具有匹配网站的模式。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号