首页> 外文会议>IEEE International Conference on Research, Innovation and Vision for the Future >Simple but effective methods for combining kernels in computational biology
【24h】

Simple but effective methods for combining kernels in computational biology

机译:在计算生物学中结合内核的简单但有效方法

获取原文

摘要

Complex biological data generated from various experiments are stored in diverse data types in multiple datasets. By appropriately representing each biological dataset as a kernel matrix then combining them in solving problems, the kernel-based approach has become a spotlight in data integration and its application in bioinformatics and other fields as well. While linear combination of unweighed multiple kernels (UMK) is popular, there have been effort on multiple kernel learning (MKL) where optimal weights are learned by semi-definite programming or sequential minimal optimization (SMO-MKL). These methods provide high accuracy of biological prediction problems, but very complicated and hard to use, especially for non-experts in optimization. These methods are also usually of high computational cost and not suitable for large data sets. In this paper, we propose two simple but effective methods for determining weights for conic combination of multiple kernels. The former is to learn optimal weights formulated by our measure FSM for kernel matrix evaluation (feature space-based kernel matrix evaluation measure), denoted by FSM-MKL. The latter assigns a weight to each kernel that is proportional to the quality of the kernel, determining by direct cross validation, named proportionally weighted multiple kernels (PWMK). Experimental comparative evaluation of the four methods UMK, SMO-MKL, FSM-MKL and PWMK for the problem of protein-protein interactions shows that our proposed methods are simpler, more efficient but still effective. They achieved performances almost as high as that of MKL and higher than that of UMK.
机译:从各种实验生成的复杂生物数据以多个数据集中的不同数据类型存储。通过适当地表示每个生物数据集作为内核矩阵,然后将它们组合在解决问题中,基于内核的方法已成为数据集成的焦点及其在生物信息学和其他字段中的应用。虽然未激昂的多核(UMK)的线性组合是流行的,但是已经有多个内核学习(MKL),其中通过半定编程或顺序最小优化(SMO-MKL)学习了最佳权重。这些方法提供了高精度的生物预测问题,但非常复杂,难以使用,特别是对于优化中的非专家。这些方法通常也具有高计算成本并且不适合大数据集。在本文中,我们提出了两种简单但有效的方法,用于确定多个核的圆锥组合的重量。前者是学习由我们的测量FSM为内核矩阵评估(特征基于空间的内核矩阵评估测量)制定的最佳权重,由FSM-MKL表示。后者对每个内核的权重与内核的质量成比例,通过直接交叉验证确定,命名为比例加权多核(PWMK)。对蛋白质 - 蛋白质相互作用问题的四种方法UMK,Smo-MKL,FSM-MKL和PWMK的实验性比较评价表明,我们所提出的方法更简单,更有效但仍然有效。他们实现了几乎高于MKL的表现,高于UMK。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号