...
首页> 外文期刊>SIGKDD explorations >Transductive Multi-label Ensemble Classification for Protein Function Prediction
【24h】

Transductive Multi-label Ensemble Classification for Protein Function Prediction

机译:蛋白质功能预测的转导多标签集合分类。

获取原文
获取原文并翻译 | 示例
           

摘要

Advances in biotechnology have made available multitudes of heterogeneous proteomic and genomic data. Integrating these heterogeneous data sources, to automatically infer the function of proteins, is a fundamental challenge in computational biology. Several approaches represent each data source with a kernel (similarity) function. The resulting kernels are then integrated to determine a composite kernel, which is used for developing a function prediction model. Proteins are also found to have multiple roles and functions. As such, several approaches cast the protein function prediction problem within a multi-label learning framework. In our work we develop an approach that takes advantage of several unlabeled proteins, along with multiple data sources and multiple functions of proteins. We develop a graph-based transductive multi-label classifier (TMC) that is evaluated on a composite kernel, and also propose a method for data integration using the ensemble framework, called transductive multi-label ensemble classifier (TMEC). The TMEC approach trains a graph-based multi-label classifier for each individual kernel, and then combines the predictions of the individual models. Our contribution is the use of a bi-relational directed graph that captures relationships between pairs of proteins, between pairs of functions, and between proteins and functions. We evaluate the ability of TMC and TMEC to predict the functions of proteins by using two yeast datasets. We show that our approach performs better than recently proposed protein function prediction methods on composite and multiple kernels.
机译:生物技术的进步已经提供了许多异构蛋白质组学和基因组数据。集成这些异构数据源以自动推断蛋白质的功能,是计算生物学的一项基本挑战。几种方法用内核(相似性)函数表示每个数据源。然后将生成的内核进行集成以确定复合内核,该复合内核用于开发功能预测模型。还发现蛋白质具有多种作用和功能。这样,几种方法将蛋白质功能预测问题归结为多标签学习框架。在我们的工作中,我们开发了一种利用几种未标记蛋白质以及多种数据源和多种功能的蛋白质的方法。我们开发了一种基于图的转导式多标签分类器(TMC),该分类器在复合内核上进行了评估,并提出了一种使用集成框架的数据集成方法,称为转导式多标签集成分类器(TMEC)。 TMEC方法为每个单独的内核训练一个基于图的多标签分类器,然后组合各个模型的预测。我们的贡献是使用双向有向图来捕获蛋白质对之间,功能对之间以及蛋白质与功能之间的关系。我们通过使用两个酵母数据集评估TMC和TMEC预测蛋白质功能的能力。我们表明,我们的方法在复合核和多核上比最近提出的蛋白质功能预测方法表现更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号