首页> 外文期刊>Molecules >iPPBS-Opt: A Sequence-Based Ensemble Classifier for Identifying Protein-Protein Binding Sites by Optimizing Imbalanced Training Datasets
【24h】

iPPBS-Opt: A Sequence-Based Ensemble Classifier for Identifying Protein-Protein Binding Sites by Optimizing Imbalanced Training Datasets

机译:iPPBS-Opt:通过优化不平衡训练数据集来识别蛋白质与蛋白质结合位点的基于序列的集合分类器

获取原文
获取原文并翻译 | 示例
           

摘要

Knowledge of protein-protein interactions and their binding sites is indispensable for in-depth understanding of the networks in living cells. With the avalanche of protein sequences generated in the postgenomic age, it is critical to develop computational methods for identifying in a timely fashion the protein-protein binding sites (PPBSs) based on the sequence information alone because the information obtained by this way can be used for both biomedical research and drug development. To address such a challenge, we have proposed a new predictor, called iPPBS-Opt, in which we have used: (1) the K-Nearest Neighbors Cleaning (KNNC) and Inserting Hypothetical Training Samples (IHTS) treatments to optimize the training dataset; (2) the ensemble voting approach to select the most relevant features; and (3) the stationary wavelet transform to formulate the statistical samples. Cross-validation tests by targeting the experiment-confirmed results have demonstrated that the new predictor is very promising, implying that the aforementioned practices are indeed very effective. Particularly, the approach of using the wavelets to express protein/peptide sequences might be the key in grasping the problem's essence, fully consistent with the findings that many important biological functions of proteins can be elucidated with their low-frequency internal motions. To maximize the convenience of most experimental scientists, we have provided a step-by-step guide on how to use the predictor's web server (http://www.jci-bioinfo.cn/iPPBS-Opt) to get the desired results without the need to go through the complicated mathematical equations involved.
机译:蛋白质-蛋白质相互作用及其结合位点的知识对于深入了解活细胞中的网络是必不可少的。随着后基因组时代产生的蛋白质序列的大量涌现,开发一种能够仅基于序列信息及时识别蛋白质-蛋白质结合位点(PPBS)的计算方法就变得至关重要,因为可以使用通过这种方式获得的信息用于生物医学研究和药物开发。为了应对这一挑战,我们提出了一种新的预测变量,称为iPPBS-Opt,我们在其中使用了:(1)K最近邻清洗(KNNC)和插入假设训练样本(IHTS)处理以优化训练数据集; (2)整体投票法选择最相关的特征; (3)平稳小波变换以建立统计样本。通过针对实验确认的结果进行交叉验证测试,表明新的预测变量非常有前途,这意味着上述实践确实非常有效。特别是,使用小波表达蛋白质/肽序列的方法可能是把握问题本质的关键,这与蛋白质的许多重要生物学功能可以通过其低频内部运动来阐明的发现完全一致。为了最大程度地提高大多数实验科学家的便利性,我们提供了有关如何使用预测变量的Web服务器(http://www.jci-bioinfo.cn/iPPBS-Opt)以获得不需预期结果的逐步指南。需要研究涉及的复杂数学方程式。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号