首页> 外文会议>IEEE International Conference on Bioinformatics and Biomedicine >Online Multi-Instance Multi-Label learning for protein function prediction
【24h】

Online Multi-Instance Multi-Label learning for protein function prediction

机译:在线多实例多标签学习,用于蛋白质功能预测

获取原文

摘要

Protein function prediction is a challenging and essential research problem in the field of computational biology. Conventionally, a protein consists of a number of structural domains and performs multiple function. By representing proteins, domains and functions by bags as well as instances and classes respectively, we are able to model the protein function prediction task as the Multi-Instance Multi-Label (MIML) learning problem. Existing MIML algorithms mainly focus on batch setting where training examples are available before learning. Such offline paradigm works well in simulation, but it may be not feasible for real-world online applications where data comes one by one or chunk by chunk. In this paper, we investigate the protein function prediction problem under a new learning framework, called Online Multi-Instance Multi-Label (OMIML) learning, where MIML protein examples arrive sequentially in an online setting, and develop two OMIML algorithms (OMIML-I and OMIML-B) to make predictions for the incoming data. In the proposed OMIML algorithms, variable-length features are constructed to represent the MIML protein examples based on an incremental vocabulary mechanism. In particular, the incremental vocabularies that OMIML-I and OMIML-B are based on consist of instances and bags, respectively. Then we seek an online prediction for each new arrived protein example by incorporating the constructed features into an online multi-label learning algorithm which is constructed by introducing an artificial label into an online multi-label ranking model. We evaluate the algorithms on the protein dataset consisting of seven real-world organisms. Experimental results have demonstrated the effectiveness of the proposed OMIML algorithms for protein function prediction.
机译:蛋白质功能预测是计算生物学领域的具有挑战性和基本的研究问题。通常,蛋白质由许多结构域组成并执行多个功能。通过代表袋子,域和函数分别代表袋子以及实例和类别,我们能够将蛋白质函数预测任务模拟为多实例多标签(MIML)学习问题。现有的MIML算法主要关注批量设置,其中在学习之前可用。这种离线范例在仿真中运行良好,但对于现实世界在线应用可能是不可行的,其中数据通过块的一个或块逐块。在本文中,我们在新的学习框架下调查蛋白质函数预测问题,称为在线多实例多标签(OMIML)学习,其中MIML蛋白示例在在线设置中顺序到达,并开发两个OMIML算法(OMIML-I和omiml-b)为输入数据进行预测。在所提出的OMIML算法中,构建可变长度特征以基于增量词汇机制来表示MIML蛋白质示例。特别地,OMIML-I和OMIML-B的增量词汇表分别基于实例和袋子。然后,我们通过将构建的特征结合到在在线多标签学习算法中来寻求每个新的蛋白质示例的在线预测,该算法通过将人工标签引入在线多标签排名模型来构造。我们评估由七种真实世界的蛋白质数据集的算法。实验结果表明了蛋白质功能预测所提出的OMIML算法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号