...
首页> 外文期刊>BMC Bioinformatics >ProDis-ContSHC: learning protein dissimilarity measures and hierarchical context coherently for protein-protein comparison in protein database retrieval
【24h】

ProDis-ContSHC: learning protein dissimilarity measures and hierarchical context coherently for protein-protein comparison in protein database retrieval

机译:PRODIS-CONTSHC:学习蛋白质不同措施和分层上下文,蛋白质数据库检索中的蛋白质 - 蛋白质比较

获取原文
           

摘要

BackgroundThe need to retrieve or classify protein molecules using structure or sequence-based similarity measures underlies a wide range of biomedical applications. Traditional protein search methods rely on a pairwise dissimilarity/similarity measure for comparing a pair of proteins. This kind of pairwise measures suffer from the limitation of neglecting the distribution of other proteins and thus cannot satisfy the need for high accuracy of the retrieval systems. Recent work in the machine learning community has shown that exploiting the global structure of the database and learning the contextual dissimilarity/similarity measures can improve the retrieval performance significantly. However, most existing contextual dissimilarity/similarity learning algorithms work in an unsupervised manner, which does not utilize the information of the known class labels of proteins in the database.ResultsIn this paper, we propose a novel protein-protein dissimilarity learning algorithm, ProDis-ContSHC. ProDis-ContSHC regularizes an existing dissimilarity measure d ij by considering the contextual information of the proteins. The context of a protein is defined by its neighboring proteins. The basic idea is, for a pair of proteins (i, j), if their context and is similar to each other, the two proteins should also have a high similarity. We implement this idea by regularizing d ij by a factor learned from the context and .Moreover, we divide the context to hierarchial sub-context and get the contextual dissimilarity vector for each protein pair. Using the class label information of the proteins, we select the relevant (a pair of proteins that has the same class labels) and irrelevant (with different labels) protein pairs, and train an SVM model to distinguish between their contextual dissimilarity vectors. The SVM model is further used to learn a supervised regularizing factor. Finally, with the new Supervised learned Dissimilarity measure, we update the Protein Hierarchial Context Coherently in an iterative algorithm--ProDis-ContSHC.We test the performance of ProDis-ContSHC on two benchmark sets, i.e., the ASTRAL 1.73 database and the FSSP/DALI database. Experimental results demonstrate that plugging our supervised contextual dissimilarity measures into the retrieval systems significantly outperforms the context-free dissimilarity/similarity measures and other unsupervised contextual dissimilarity measures that do not use the class label information.ConclusionsUsing the contextual proteins with their class labels in the database, we can improve the accuracy of the pairwise dissimilarity/similarity measures dramatically for the protein retrieval tasks. In this work, for the first time, we propose the idea of supervised contextual dissimilarity learning, resulting in the ProDis-ContSHC algorithm. Among different contextual dissimilarity learning approaches that can be used to compare a pair of proteins, ProDis-ContSHC provides the highest accuracy. Finally, ProDis-ContSHC compares favorably with other methods reported in the recent literature.
机译:背景技术需要使用结构或基于序列的相似性测量来检索或分类蛋白质分子下潜的各种生物医学应用。传统蛋白质搜索方法依赖于比较一对蛋白质的成对异化/相似性度量。这种成对措施遭受忽略其他蛋白质分布的限制,因此不能满足对检索系统的高精度的需求。最近在机器学习社区的工作表明,利用数据库的全球结构和学习上下文相似度/相似度措施可以显着提高检索性能。然而,大多数现有的上下文不相似性/相似性学习算法以无监督的方式工作,它不利用数据库中已知的蛋白质类蛋白质的信息。方法提出了一种新的蛋白质 - 蛋白质不相似性学习算法,Prodis- contshc。 Prodis-ContSHC通过考虑蛋白质的上下文信息,将现有的不相似度量D IJ 正规化。蛋白质的上下文由其邻近蛋白质定义。基本思想是,对于一对蛋白质(I,J),如果它们的上下文和彼此类似,则两种蛋白质也应该具有高相似性。我们通过从上下文和.moreover学习的因子来规范d ij 来实现这个想法。我们将上下文划分为分层子上下文,并获得每个上下文不同矢量蛋白质对。使用蛋白质的类标签信息,我们选择相关的(一对具有相同类标签的蛋白质)和无关(用不同的标签)蛋白质对,并训练SVM模型以区分其上下文不相似载体。 SVM模型还用于学习监督的正规规则。最后,随着新的监督学习的不相似度量,我们在迭代算法 - Prodis-contShc中完全更新蛋白质层次上下文 - Prodis-ContShc.we测试Prodis-ContShc的性能,即在两个基准组上,即Astral 1.73数据库和FSSP / DALI数据库。实验结果表明,将监督的上下文相似性措施堵塞到检索系统中显着优于无与伦比的不相似性/相似度措施和不使用类标签信息的其他无监督的上下文不同措施。将上下文蛋白质与数据库中的类标签联系起来,我们可以急剧提高成对异化/相似性措施的准确性,以急剧地用于蛋白质检索任务。在这项工作中,我们第一次提出了监督上下文不相似学习的想法,从而导致ProDis-ContSCHC算法。在可以用于比较一对蛋白的不同上下文相似性学习方法中,ProDis-ContShc提供了最高的精度。最后,ProDis-Contshc与最近文献中报道的其他方法有利地比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号