首页> 外文会议>IEEE International Conference on Bioinformatics and Biomedicine >AiProAnnotator: Low-rank Approximation with network side information for high-performance, large-scale human Protein abnormality Annotator
【24h】

AiProAnnotator: Low-rank Approximation with network side information for high-performance, large-scale human Protein abnormality Annotator

机译:AiProAnnotator:具有网络辅助信息的低秩近似,用于高性能,大规模人类蛋白质异常注释器

获取原文

摘要

Annotating genes/proteins is a vital issue in biology. Particularly we focus on human proteins and medical annotation, which both are important. The most proper data for our annotation is human phenotype ontology (HPO), which are sparse but reliable (well-curated). Existing approaches for this problem are feature-based or network-based. The feature-based approach can incorporate a variety of information, by which this approach is more appropriate for noisy data than reliable data, while the network-based approach is not necessarily useful for sparse data. Low-rank approximation is very powerful for both sparse and reliable data. We thus propose to use matrix factorization to approximate the input annotation matrix (proteins × HPO terms) by factorized low-rank matrices. We further incorporate network information, i.e. protein-protein network (PPN) and network from HPO (NHPO), into the framework of matrix factorization as graph regularization over the two low-rank matrices. That is, the input annotation matrix is factorized into two low-rank factor matrices so that they can be smooth over PPN and NHPO. We call our software of implementing the above method “AiProAnnotator”, which in this paper has been empirically examined using the latest HPO data extensively under various experimental settings, including performance comparison under cross-validation, computation time and case studies, etc. Experimental results showed the high predictive performance and time efficiency of AiProAnnotator clearly.
机译:注释基因/蛋白质是生物学中的重要问题。特别地,我们专注于人类蛋白质和医学注释,两者都很重要。用于我们注释的最适当数据是人类表型本体(HPO),它稀疏但可靠(精心策划)。解决此问题的现有方法是基于功能或基于网络的。基于特征的方法可以包含各种信息,通过这种方法,该方法比可靠的数据更适合于嘈杂的数据,而基于网络的方法不一定适用于稀疏数据。对于稀疏和可靠的数据,低秩逼近非常强大。因此,我们建议使用矩阵分解来通过分解的低秩矩阵来近似输入注释矩阵(蛋白质×HPO项)。我们进一步将网络信息(即蛋白质-蛋白质网络(PPN)和HPO的网络(NHPO))纳入矩阵分解的框架中,作为两个低阶矩阵的图正则化。也就是说,将输入注释矩阵分解为两个低秩因子矩阵,以便它们可以在PPN和NHPO上平滑。我们将实现上述方法的软件称为“ AiProAnnotator”,本文已在各种实验设置下广泛使用最新的HPO数据进行了经验检验,包括交叉验证下的性能比较,计算时间和案例研究等。实验结果清楚地显示了AiProAnnotator的高预测性能和时间效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号