Prediction of Protein-Protein Interaction Sites with Machine-Learning-Based Data-Cleaning and Post-Filtering Procedures

Liu Guang-Hui; Shen Hong-Bin; Yu Dong-Jun

首页> 外文期刊>The Journal of Membrane Biology: An International Journal for Studies on the Structure, Function & Genesis of Biomembranes >Prediction of Protein-Protein Interaction Sites with Machine-Learning-Based Data-Cleaning and Post-Filtering Procedures

【24h】

Prediction of Protein-Protein Interaction Sites with Machine-Learning-Based Data-Cleaning and Post-Filtering Procedures

机译：基于机器学习的数据清洁和后滤波程序的蛋白质 - 蛋白质相互作用位点预测

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Accurately predicting protein-protein interaction sites (PPIs) is currently a hot topic because it has been demonstrated to be very useful for understanding disease mechanisms and designing drugs. Machine-learning-based computational approaches have been broadly utilized and demonstrated to be useful for PPI prediction. However, directly applying traditional machine learning algorithms, which often assume that samples in different classes are balanced, often leads to poor performance because of the severe class imbalance that exists in the PPI prediction problem. In this study, we propose a novel method for improving PPI prediction performance by relieving the severity of class imbalance using a data-cleaning procedure and reducing predicted false positives with a post-filtering procedure: First, a machine-learning-based data-cleaning procedure is applied to remove those marginal targets, which may potentially have a negative effect on training a model with a clear classification boundary, from the majority samples to relieve the severity of class imbalance in the original training dataset; then, a prediction model is trained on the cleaned dataset; finally, an effective post-filtering procedure is further used to reduce potential false positive predictions. Stringent cross-validation and independent validation tests on benchmark datasets demonstrated the efficacy of the proposed method, which exhibits highly competitive performance compared with existing state-of-the-art sequence-based PPIs predictors and should supplement existing PPI prediction methods.

机译：准确预测蛋白质 - 蛋白质相互作用位点（PPI）是目前是一个热门话题，因为它已被证明是对理解疾病机制和设计药物非常有用。基于机器学习的计算方法已经广泛利用，并证明了对PPI预测有用。然而，直接应用传统的机器学习算法，这些算法通常认为不同类别的样本是平衡的，通常由于PPI预测问题中存在的严重阶级不平衡而导致性能差。在这项研究中，我们提出了一种新的方法，通过数据清理程序缩短了通过滤波后期的阶级不平衡的严重性来提高PPI预测性能的新方法，并通过后滤波过程减少预测的误报：第一，一种基于机器学习的数据清洁申请程序以消除那些边际目标，这可能对培训具有清晰分类边界的模型可能产生负面影响，从大多数样本中释放原始训练数据集中的阶级失衡严重程度;然后，在清洁的数据集上培训预测模型;最后，进一步用于减少潜在的假阳性预测的有效滤波过程。基准数据集上的严格交叉验证和独立验证测试证明了该方法的功效，其与现有的基于最先进的基于序列的PPI预测因子相比表现出高竞争性能，并且应该补充现有的PPI预测方法。

著录项

来源
《The Journal of Membrane Biology: An International Journal for Studies on the Structure, Function & Genesis of Biomembranes》 |2016年第1期|共13页
作者
Liu Guang-Hui; Shen Hong-Bin; Yu Dong-Jun;
展开▼
作者单位

Nanjing Univ Sci &

Technol Sch Comp Sci &

Engn Xiaolingwei 200 Nanjing 210094 Jiangsu Peoples;

Shanghai Jiao Tong Univ Inst Image Proc &

Pattern Recognit Dongchuan Rd 800 Shanghai 200240;

Nanjing Univ Sci &

Technol Sch Comp Sci &

Engn Xiaolingwei 200 Nanjing 210094 Jiangsu Peoples;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类分子生物学;
关键词
Protein-protein interaction sites; Imbalanced learning; Data cleaning; Random forests; Post-filtering;

机译：蛋白质 - 蛋白质相互作用位点;学习不平衡;数据清洁;随机森林;过滤后;

相似文献

外文文献
中文文献
专利

1. Prediction of Protein-Protein Interaction Sites with Machine-Learning-Based Data-Cleaning and Post-Filtering Procedures [J] . Liu Guang-Hui, Shen Hong-Bin, Yu Dong-Jun The Journal of Membrane Biology: An International Journal for Studies on the Structure, Function & Genesis of Biomembranes . 2016,第2期

机译：预测基于机器学习的数据清洗和后过滤程序的蛋白质-蛋白质相互作用位点
2. Evaluation of the third solvent clusters fitting procedure for the prediction of protein-protein interactions based on the results at the CAPRI blind docking study. [J] . Komatsu K, Kurihara Y, Iwadate M, Proteins: Structure, Function, and Genetics . 2003,第1期

机译：根据CAPRI盲对接研究的结果，评估用于预测蛋白质-蛋白质相互作用的第三种溶剂簇拟合程序。
3. CRF-based models of protein surfaces improve protein-protein interaction site predictions [J] . Zhijie Dong, Keyu Wang, Truong K Linh Dang, BMC Bioinformatics . 2014,第1期

机译：基于CRF的蛋白质型号改善蛋白质 - 蛋白质相互作用位点预测
4. Interaction site prediction by structural similarity to neighboring clusters in protein-protein interaction networks [C] . Hiroyuki Monji, Satoshi Koizumi, Tomonobu Ozaki, Asia-Pacific Bioinformatics Conference . 2012

机译：蛋白质 - 蛋白质相互作用网络中结构相似性的相互作用位点预测
5. Structure-based prediction of protein-protein interaction sites [D] . Jordan, Rafael A. 2012

机译：基于结构的蛋白质-蛋白质相互作用位点预测
6. CRF-based models of protein surfaces improve protein-protein interaction site predictions [O] . Zhijie Dong, Keyu Wang, Truong Khanh Linh Dang, 2014

机译：基于CRF的蛋白质表面模型改善了蛋白质-蛋白质相互作用位点的预测
7. Bioinformatics Approach to Probe Protein-Protein Interactions: Understanding the Role of Interfacial Solvent in the Binding Sites of Protein-Protein Complexes;Network Based Predictions and Analysis of Human Proteins that Play Critical Roles in HIV Pathogenesis. [O] . Habtemariam Mesay 2013

机译：探测蛋白质-蛋白质相互作用的生物信息学方法：了解界面溶剂在蛋白质-蛋白质复合物结合位点中的作用；基于网络的预测和分析在HIV发病机理中起关键作用的人类蛋白质。

Prediction of Protein-Protein Interaction Sites with Machine-Learning-Based Data-Cleaning and Post-Filtering Procedures

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅