首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >A Sequence-Based Dynamic Ensemble Learning System for Protein Ligand-Binding Site Prediction
【24h】

A Sequence-Based Dynamic Ensemble Learning System for Protein Ligand-Binding Site Prediction

机译:蛋白质配体结合位点预测的基于序列的动态集成学习系统。

获取原文
获取原文并翻译 | 示例

摘要

Background: Proteins have the fundamental ability to selectively bind to other molecules and perform specific functions through such interactions, such as protein-ligand binding. Accurate prediction of protein residues that physically bind to ligands is important for drug design and protein docking studies. Most of the successful protein-ligand binding predictions were based on known structures. However, structural information is not largely available in practice due to the huge gap between the number of known protein sequences and that of experimentally solved structures. Results: This paper proposes a dynamic ensemble approach to identify protein-ligand binding residues by using sequence information only. To avoid problems resulting from highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we constructed several balanced data sets and we trained a random forest classifier for each of them. We dynamically selected a subset of classifiers according to the similarity between the target protein and the proteins in the training data set. The combination of the predictions of the classifier subset to each query protein target yielded the final predictions. The ensemble of these classifiers formed a sequence-based predictor to identify protein-ligand binding sites. Conclusions: Experimental results on two Critical Assessment of protein Structure Prediction datasets and the ccPDB dataset demonstrated that of our proposed method compared favorably with the state-of-the-art. Availability: http://www2.ahu.edu.cn/pchen/ web/LigandDSES.htm.
机译:背景:蛋白质具有选择性结合其他分子并通过此类相互作用执行特定功能的基本能力,例如蛋白质-配体结合。准确预测与配体物理结合的蛋白质残基对于药物设计和蛋白质对接研究很重要。大多数成功的蛋白质-配体结合预测均基于已知结构。然而,由于已知蛋白质序列的数目与实验解析的结构的数目之间的巨大差距,结构信息在实践中无法获得。结果:本文提出了一种仅通过使用序列信息来识别蛋白质-配体结合残基的动态集成方法。为了避免由于配体结合位点和非配体结合位点之间高度不平衡的样品而导致的问题,我们构建了几个平衡的数据集,并为每个数据集训练了一个随机森林分类器。我们根据目标蛋白质和训练数据集中蛋白质之间的相似性动态选择分类器的子集。分类子集的预测与每个查询蛋白目标的组合产生了最终预测。这些分类器的集合形成了基于序列的预测子,以鉴定蛋白质-配体结合位点。结论:对蛋白质结构预测数据集和ccPDB数据集的两个关键评估的实验结果表明,我们提出的方法与最新技术相比具有优势。可用性:http://www2.ahu.edu.cn/pchen/ web / LigandDSES.htm。

著录项

  • 来源
  • 作者单位

    Institute of Health Sciences, Anhui University, Hefei, Anhui, China;

    Institute of Health Sciences, Anhui University, Hefei, Anhui, China;

    College of Electrical Engineering and Automation, Anhui University, Hefei, Anhui, China;

    Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia;

    Advanced Analytics Institute, University of Technology, Sydney, N.S.W., Australia;

    Institute of Health Sciences, Anhui University, Hefei, Anhui, China;

    School of Electronics and Information Engineering, Tongji University, Shanghai, China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Proteins; Vegetation; Correlation; Ions; Iron; Amino acids;

    机译:蛋白质;植被;相关性;离子;铁;氨基酸;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号