...
首页> 外文期刊>Bioinformatics >Inductive matrix completion for predicting gene-disease associations
【24h】

Inductive matrix completion for predicting gene-disease associations

机译:归纳矩阵完成预测基因-疾病关联。

获取原文
获取原文并翻译 | 示例
           

摘要

Motivation: Most existing methods for predicting causal disease genes rely on specific type of evidence, and are therefore limited in terms of applicability. More often than not, the type of evidence available for diseases varies-for example, we may know linked genes, keywords associated with the disease obtained by mining text, or co-occurrence of disease symptoms in patients. Similarly, the type of evidence available for genes varies-for example, specific microarray probes convey information only for certain sets of genes. In this article, we apply a novel matrix-completion method called Inductive Matrix Completion to the problem of predicting gene-disease associations; it combines multiple types of evidence (features) for diseases and genes to learn latent factors that explain the observed gene-disease associations. We construct features from different biological sources such as microarray expression data and disease-related textual data. A crucial advantage of the method is that it is inductive; it can be applied to diseases not seen at training time, unlike traditional matrix-completion approaches and network-based inference methods that are transductive. Results: Comparison with state-of-the-art methods on diseases from the Online Mendelian Inheritance in Man (OMIM) database shows that the proposed approach is substantially better-it has close to onein- four chance of recovering a true association in the top 100 predictions, compared to the recently proposed CATAPULT method (second best) that has 515% chance. We demonstrate that the inductive method is particularly effective for a query disease with no previously known gene associations, and for predicting novel genes, i.e. genes that are previously not linked to diseases. Thus the method is capable of predicting novel genes even for well-characterized diseases. We also validate the novelty of predictions by evaluating the method on recently reported OMIM associations and on associations recently reported in the literature.
机译:动机:预测因果疾病基因的大多数现有方法依赖特定类型的证据,因此在适用性方面受到限制。通常,疾病的证据类型会有所不同-例如,我们可能知道关联的基因,与通过挖掘文本获得的疾病相关的关键字或患者中疾病症状的共现。同样,可用于基因的证据类型也各不相同-例如,特定的微阵列探针仅传达某些基因组的信息。在本文中,我们将一种称为诱导矩阵完成的新颖矩阵完成方法应用于预测基因-疾病关联的问题。它结合了多种疾病和基因的证据(特征),以了解解释所观察到的基因-疾病关联的潜在因素。我们从不同的生物学来源构建特征,例如微阵列表达数据和疾病相关的文本数据。该方法的关键优势在于它是感应式的。它可以应用于训练时未见的疾病,这不同于传统的矩阵完成方法和具有传导性的基于网络的推理方法。结果:与在线孟德尔遗传在线(OMIM)数据库中有关疾病的最新方法的比较表明,所提出的方法明显更好-在顶部恢复真正联系的机会接近四分之一与最近提出的CATAPULT方法(次优)相比具有100%的预测,它具有515%的机会。我们证明了归纳方法对于没有先前已知基因关联的查询疾病以及预测新基因,即先前与疾病不相关的基因特别有效。因此,该方法甚至对于特征明确的疾病也能够预测新基因。我们还通过评估最近报道的OMIM关联和文献中报道的关联的方法来验证预测的新颖性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号