首页> 美国卫生研究院文献>Genome Research >Systematic Learning of Gene Functional Classes From DNA Array Expression Data by Using Multilayer Perceptrons
【2h】

Systematic Learning of Gene Functional Classes From DNA Array Expression Data by Using Multilayer Perceptrons

机译:通过使用多层感知器从DNA阵列表达数据系统学习基因功能分类

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Recent advances in microarray technology have opened new ways for functional annotation of previously uncharacterised genes on a genomic scale. This has been demonstrated by unsupervised clustering of co-expressed genes and, more importantly, by supervised learning algorithms. Using prior knowledge, these algorithms can assign functional annotations based on more complex expression signatures found in existing functional classes. Previously, support vector machines (SVMs) and other machine-learning methods have been applied to a limited number of functional classes for this purpose. Here we present, for the first time, the comprehensive application of supervised neural networks (SNNs) for functional annotation. Our study is novel in that we report systematic results for ∼100 classes in the Munich Information Center for Protein Sequences (MIPS) functional catalog. We found that only ∼10% of these are learnable (based on the rate of false negatives). A closer analysis reveals that false positives (and negatives) in a machine-learning context are not necessarily “false” in a biological sense. We show that the high degree of interconnections among functional classes confounds the signatures that ought to be learned for a unique class. We term this the “Borges effect” and introduce two new numerical indices for its quantification. Our analysis indicates that classification systems with a lower Borges effect are better suitable for machine learning. Furthermore, we introduce a learning procedure for combining false positives with the original class. We show that in a few iterations this process converges to a gene set that is learnable with considerably low rates of false positives and negatives and contains genes that are biologically related to the original class, allowing for a coarse reconstruction of the interactions between associated biological pathways. We exemplify this methodology using the well-studied tricarboxylic acid cycle.
机译:微阵列技术的最新进展为在基因组规模上对先前未鉴定的基因进行功能注释开辟了新途径。共表达基因的无监督聚类证明了这一点,更重要的是,有监督学习算法证明了这一点。使用现有知识,这些算法可以基于在现有功能类中找到的更复杂的表达签名来分配功能注释。以前,为此目的,支持向量机(SVM)和其他机器学习方法已应用于有限数量的功能类。在这里,我们首次展示了监督神经网络(SNN)在功能注释中的综合应用。我们的研究是新颖的,因为我们在慕尼黑蛋白质序列信息中心(MIPS)功能目录中报告了约100个类的系统结果。我们发现其中只有约10%是可学习的(基于误报率)。仔细分析发现,从生物学的角度来看,机器学习上下文中的假阳性(和阴性)不一定是“假”。我们表明,功能类之间的高度互连会混淆应该为唯一类学习的签名。我们称其为“伯格斯效应”,并为其量化引入了两个新的数值指标。我们的分析表明,具有较低博尔赫斯效应的分类系统更适合于机器学习。此外,我们介绍了一种将误报与原始班级相结合的学习程序。我们显示,在几次迭代中,该过程会收敛到一个基因组,该基因组可以以极低的假阳性和阴性率进行学习​​,并且包含与原始类别生物学相关的基因,从而可以粗略地重建相关生物途径之间的相互作用。我们使用经过充分研究的三羧酸循环来例证这种方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号