...
首页> 外文期刊>Annals of the American Thoracic Society >Efficiently Predicting Hot Spots in PPIs by Combining Random Forest and Synthetic Minority Over-Sampling Technique
【24h】

Efficiently Predicting Hot Spots in PPIs by Combining Random Forest and Synthetic Minority Over-Sampling Technique

机译:通过组合随机森林和合成少数群体过采样技术有效地预测PPI中的热点

获取原文
获取原文并翻译 | 示例
           

摘要

Hot spot residues bring into play the vital function in bioinformatics to find new medications such as drug design. However, current datasets are predominately composed of non-hot spots with merely a tiny percentage of hot spots. Conventional hot spots prediction methods may face great challenges towards the problem of imbalance training samples. This paper presents a classification method combining with random forest classification and oversampling strategy to improve the training performance. A strategy with an oversampling ability is used to generate hot spots data to balance the given training set. Random forest classification is then invoked to generate a set of forest trees for this oversampled training set. The final prediction performance can be computed recursively after the oversampling and training process. This proposed method is capable of randomly selecting features and constructing a robust random forest to avoid overfitting the training set. Experimental results from three data sets indicate that the performance of hot spots prediction has been significantly improved compared with existing classification methods.
机译:热点残留物引入生物信息学中的重要功能,以找到新药物,如药物设计。然而,当前数据集主要由非热点组成,仅具有较小的热点百分比。传统的热点预测方法可能面临巨大挑战对不平衡训练样本的问题。本文介绍了与随机林分类和过采样策略相结合的分类方法,以提高培训性能。具有过采样能力的策略用于生成热点数据以平衡给定的培训集。然后调用随机森林分类以为此过采样培训集生成一组林木。在过采样和培训过程之后,可以递归地计算最终预测性能。该提出的方法能够随机选择特征并构建坚固的随机森林,以避免过度接种训练集。三种数据集的实验结果表明,与现有的分类方法相比,热点预测的性能显着提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号