首页> 外文期刊>BMC Bioinformatics >An improved approach to infer protein-protein interaction based on a hierarchical vector space model
【24h】

An improved approach to infer protein-protein interaction based on a hierarchical vector space model

机译:一种基于层次向量空间模型的蛋白质间相互作用的改进方法

获取原文
           

摘要

Comparing and classifying functions of gene products are important in today’s biomedical research. The semantic similarity derived from the Gene Ontology (GO) annotation has been regarded as one of the most widely used indicators for protein interaction. Among the various approaches proposed, those based on the vector space model are relatively simple, but their effectiveness is far from satisfying. We propose a Hierarchical Vector Space Model (HVSM) for computing semantic similarity between different genes or their products, which enhances the basic vector space model by introducing the relation between GO terms. Besides the directly annotated terms, HVSM also takes their ancestors and descendants related by “is_a” and “part_of” relations into account. Moreover, HVSM introduces the concept of a Certainty Factor to calibrate the semantic similarity based on the number of terms annotated to genes. To assess the performance of our method, we applied HVSM to Homo sapiens and Saccharomyces cerevisiae protein-protein interaction datasets. Compared with TCSS, Resnik, and other classic similarity measures, HVSM achieved significant improvement for distinguishing positive from negative protein interactions. We also tested its correlation with sequence, EC, and Pfam similarity using online tool CESSM. HVSM showed an improvement of up to 4% compared to TCSS, 8% compared to IntelliGO, 12% compared to basic VSM, 6% compared to Resnik, 8% compared to Lin, 11% compared to Jiang, 8% compared to Schlicker, and 11% compared to SimGIC using AUC scores. CESSM test showed HVSM was comparable to SimGIC, and superior to all other similarity measures in CESSM as well as TCSS. Supplementary information and the software are available at https://github.com/kejia1215/HVSM .
机译:基因产品的功能比较和分类在当今的生物医学研究中很重要。从基因本体(GO)注释中获得的语​​义相似性已被认为是蛋白质相互作用最广泛使用的指标之一。在提出的各种方法中,基于向量空间模型的方法相对简单,但是其效果远不能令人满意。我们提出了一种层次向量空间模型(HVSM),用于计算不同基因或其产物之间的语义相似性,通过引入GO项之间的关系来增强基本向量空间模型。除了直接注释的术语外,HVSM还考虑了由“ is_a”和“ part_of”关系关联的祖先和后代。此外,HVSM引入了确定性因子的概念,用于基于注释基因的术语数量来校准语义相似性。为了评估我们方法的性能,我们将HVSM应用于智人和酿酒酵母蛋白质-蛋白质相互作用数据集。与TCSS,Resnik和其他经典相似性方法相比,HVSM在区分阳性和阴性蛋白质相互作用方面取得了显着进步。我们还使用在线工具CESSM测试了其与序列,EC和Pfam相似性的相关性。 HVSM与TCSS相比提高了4%,与IntelliGO相比提高了8%,与基本VSM相比提高了12%,与Resnik相比提高了6%,与Lin相比提高了8%,与Jiang相比提高了11%,与Schlicker相比提高了8%,和使用AUC评分的SimGIC相比,为11%。 CESSM测试表明,HVSM与SimGIC相当,并且优于CESSM和TCSS中的所有其他相似性度量。补充信息和软件可在https://github.com/kejia1215/HVSM获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号