首页> 外文会议>International Conference on Intelligent Systems for Molecular Biology; 20000816-23; La Jolla,CA(US) >Mining for putative regulatory elements in the yeast genome using gene expression data
【24h】

Mining for putative regulatory elements in the yeast genome using gene expression data

机译:使用基因表达数据挖掘酵母基因组中假定的调控元件

获取原文
获取原文并翻译 | 示例

摘要

We have developed a set of methods and tools for automatic discovery of putative regulatory signals in genome sequences. The analysis pipeline consists of gene expression data clustering, sequence pattern discovery from upstream sequences of genes, a control experiment for pattern significance threshold limit detection, selection of interesting patterns, grouping of these patterns, representing the pattern groups in a concise form and evaluating the discovered putative signals against existing databases of regulatory signals. The pattern discovery is computationally the most expensive and crucial step. Our tool performs a rapid exhaustive search for a priori unknown statistically significant sequence patterns of unrestricted length. The statistical significance is determined for a set of sequences in each cluster with respect to a set of background sequences allowing the detection of subtle regulatory signals specific for each cluster. The potentially large number of significant patterns is reduced to a small number of groups by clustering them by mutual similarity. Automatically derived consensus patterns of these groups represent the results in a comprehensive way for a human investigator. We have performed a systematic analysis for the yeast Sac-charomyces cerevisiae. We created a large number of independent clusterings of expression data simultaneously assessing the "goodness" of each cluster. For each of the over 52000 clusters acquired in this way we discovered significant patterns in the upstream sequences of respective genes. We selected nearly 1500 significant patterns by formal criteria and matched them against the experimentally mapped transcription factor binding sites in the SCPD database. We clustered the 1500 patterns to 62 groups for which we derived automatically alignments and consensus patterns. Of these 62 groups 48 had patterns that have matching sites in SCPD database.
机译:我们已经开发了一套用于自动发现基因组序列中假定调控信号的方法和工具。分析流程包括基因表达数据聚类,从上游基因序列中发现序列模式,用于模式重要性阈值极限检测的对照实验,有趣模式的选择,这些模式的分组,以简明形式表示模式组以及评估模式针对现有监管信号数据库发现推定信号。模式发现是计算上最昂贵,最关键的步骤。我们的工具对长度不受限制的先验未知统计显着序列模式进行了快速穷举搜索。对于每个簇中的一组序列,相对于一组背景序列,确定统计意义,从而可以检测每个簇特有的微妙调控信号。通过相互相似性将它们聚类,可以将大量潜在的有效模式减少到少量的组。这些组的自动得出的共识模式以全面的方式代表了人类研究者的结果。我们已经对酵母酿酒酵母进行了系统的分析。我们创建了大量独立的表达数据聚类,同时评估每个聚类的“优”。对于以此方式获得的超过52000个簇中的每一个,我们在相应基因的上游序列中发现了明显的模式。我们通过正式标准选择了将近1500种重要模式,并将其与SCPD数据库中实验映射的转录因子结合位点相匹配。我们将1500个模式聚集到62个组中,并为其自动得出比对和共识模式。在这62个组中,有48个具有在SCPD数据库中具有匹配站点的模式。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号