...
首页> 外文期刊>BMC Bioinformatics >Predicting gene ontology from a global meta-analysis of 1-color microarray experiments
【24h】

Predicting gene ontology from a global meta-analysis of 1-color microarray experiments

机译:从1色微阵列实验的全局荟萃分析预测基因本体

获取原文
           

摘要

BackgroundGlobal meta-analysis (GMA) of microarray data to identify genes with highly similar co-expression profiles is emerging as an accurate method to predict gene function and phenotype, even in the absence of published data on the gene(s) being analyzed. With a third of human genes still uncharacterized, this approach is a promising way to direct experiments and rapidly understand the biological roles of genes. To predict function for genes of interest, GMA relies on a guilt-by-association approach to identify sets of genes with known functions that are consistently co-expressed with it across different experimental conditions, suggesting coordinated regulation for a specific biological purpose. Our goal here is to define how sample, dataset size and ranking parameters affect prediction performance.Results13,000 human 1-color microarrays were downloaded from GEO for GMA analysis. Prediction performance was benchmarked by calculating the distance within the Gene Ontology (GO) tree between predicted function and annotated function for sets of 100 randomly selected genes. We find the number of new predicted functions rises as more datasets are added, but begins to saturate at a sample size of approximately 2,000 experiments. For the gene set used to predict function, we find precision to be higher with smaller set sizes, yet with correspondingly poor recall and, as set size is increased, recall and F-measure also tend to increase but at the cost of precision.ConclusionsOf the 20,813 genes expressed in 50 or more experiments, at least one predicted GO category was found for 72.5% of them. Of the 5,720 genes without GO annotation, 4,189 had at least one predicted ontology using top 40 co-expressed genes for prediction analysis. For the remaining 1,531 genes without GO predictions or annotations, ~17% (257 genes) had sufficient co-expression data yet no statistically significantly overrepresented ontologies, suggesting their regulation may be more complex.
机译:背景技术即使没有关于被分析基因的公开数据,微阵列数据的全球荟萃分析(GMA)来鉴定具有高度相似共表达谱的基因也正在成为预测基因功能和表型的准确方法。由于尚无人类基因的三分之一,这种方法是指导实验并快速了解基因生物学作用的一种有前途的方法。为了预测目标基因的功能,GMA依靠一种内gui的关联方法来鉴定具有已知功能的基因集,这些基因在不同的实验条件下始终与其共表达,从而建议针对特定生物学目的进行协调调控。我们的目标是定义样本,数据集大小和排名参数如何影响预测性能。结果从GEO下载了13,000个人类1色微阵列,用于GMA分析。通过计算100个随机选择的基因组的预测功能和注释功能之间的基因本体(GO)树内的距离,对预测性能进行基准测试。我们发现,随着添加更多的数据集,新的预测函数的数量增加,但是在大约2,000个实验的样本量开始饱和。对于用于预测功能的基因集,我们发现在较小的集合大小下精度更高,但召回率相对较差,并且随着集合大小的增加,召回率和F度量也趋于增加,但以准确性为代价。在50个或更多实验中表达的20,813个基因中,有72.5%的基因至少发现了一个预测的GO类别。在没有GO注释的5,720个基因中,有4,189个具有至少一种预测本体,使用前40个共表达的基因进行预测分析。对于剩下的1,531个没有GO预测或注释的基因,〜17%(257个基因)具有足够的共表达数据,但在统计学上没有明显地过度代表本体,这表明它们的调控可能更为复杂。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号