首页> 外文期刊>Bioinformatics >A novel methodology on distributed representations of proteins using their interacting ligands
【24h】

A novel methodology on distributed representations of proteins using their interacting ligands

机译:一种新的蛋白质分布式蛋白质配体的方法

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: The effective representation of proteins is a crucial task that directly affects the performance of many bioinformatics problems. Related proteins usually bind to similar ligands. Chemical characteristics of ligands are known to capture the functional and mechanistic properties of proteins suggesting that a ligand-based approach can be utilized in protein representation. In this study, we propose SMILESVec, a Simplified molecular input line entry system (SMILES)-based method to represent ligands and a novel method to compute similarity of proteins by describing them based on their ligands. The proteins are defined utilizing the word-embeddings of the SMILES strings of their ligands. The performance of the proposed protein description method is evaluated in protein clustering task using TransClust and MCL algorithms. Two other protein representation methods that utilize protein sequence, Basic local alignment tool and ProtVec, and two compound fingerprint-based protein representation methods are compared.
机译:动机:蛋白质的有效表示是一个关键任务,直接影响许多生物信息学问题的表现。相关蛋白质通常与类似配体结合。已知配体的化学特性捕获蛋白质的功能和机械特性,表明蛋白质表示可以用于配体的方法。在这项研究中,我们提出了Smilesvec,简化的分子输入排进入系统(微笑),基于基于它们的配体来计算蛋白质的相似性和一种新方法来代表配体和新方法。蛋白质是利用它们配体的微型弦的单词嵌入来定义。使用Transclust和MCL算法在蛋白质聚类任务中评估所提出的蛋白质描述方法的性能。比较了使用蛋白质序列,基本局部取向工具和ProtVec的另外两种蛋白质表示方法,以及两种复合指纹蛋白质表示方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号