首页> 美国卫生研究院文献>PLoS Clinical Trials >Searching Remote Homology with Spectral Clustering with Symmetry in Neighborhood Cluster Kernels
【2h】

Searching Remote Homology with Spectral Clustering with Symmetry in Neighborhood Cluster Kernels

机译:在邻域聚类核中以对称的谱聚类搜索远程同源性

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recognition. The deviation from random walks with inflation or dependency on hard threshold in similarity measure in those methods requires an enhancement for homology detection among multi-domain proteins. We propose to combine spectral clustering with neighborhood kernels in Markov similarity for enhancing sensitivity in detecting homology independent of “recent” paralogs. The spectral clustering approach with new combined local alignment kernels more effectively exploits the unsupervised protein sequences globally reducing inter-cluster walks. When combined with the corrections based on modified symmetry based proximity norm deemphasizing outliers, the technique proposed in this article outperforms other state-of-the-art cluster kernels among all twelve implemented kernels. The comparison with the state-of-the-art string and mismatch kernels also show the superior performance scores provided by the proposed kernels. Similar performance improvement also is found over an existing large dataset. Therefore the proposed spectral clustering framework over combined local alignment kernels with modified symmetry based correction achieves superior performance for unsupervised remote homolog detection even in multi-domain and promiscuous domain proteins from Genolevures database families with better biological relevance. Source code available upon request. Contact: .
机译:仅利用未标记序列的蛋白质之间的远程同源性检测是比较基因组学中的一个中心问题。现有的基于邻域和轮廓的聚类内核方法以及马尔可夫聚类算法是目前最流行的蛋白质家族识别方法。在那些方法中,相似性度量中的随机游走与充气或依赖于硬阈值的偏离要求增强多域蛋白之间的同源性检测。我们建议在马尔可夫相似度中结合光谱聚类和邻域核,以增强检测与“最近”旁系同源物无关的同源性的敏感性。具有新组合的局部比对内核的频谱聚类方法可更有效地全局利用无监督的蛋白质序列,从而减少集群间的走动。当与基于修正的,基于对称性的,不强调离群值的邻近准则的校正相结合时,本文提出的技术要优于所有十二种已实现内核中的其他最新集群内核。与最新的字符串和不匹配内核的比较也显示了所提出的内核提供的卓越性能得分。在现有的大型数据集上也发现了类似的性能改进。因此,即使基于Genolevures数据库家族的多域和混杂域蛋白,具有改进的对称性的修正的基于对称性的校正的组合局部比对内核的拟议光谱聚类框架,也能实现出色的无监督远程同源检测性能。可根据要求提供源代码。联系: 。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号