In order to improve the accuracy of protein-protein interaction sites prediction,based on position specific scoring matrix ( PSSM ) of a protein, this paper develops a novel feature representation-weighted PSSM histogram. In view of the extreme imbalance in training data, combining under-sampling and classifier ensemble,a random forests ensemble classifier is trained. Compared with the traditional features,the features here possess a lower dimension reserving better discrimination. Classifier ensemble remits the damage of under-sampling and improves the performance. Experimental results show that the method here is effective and outperforms the state of the art methods on benchmark datasets.%为了提高蛋白质相互作用位点预测的精度,该文基于蛋白质位置特异性得分矩阵( Position specific scoring matrix,PSSM) ,提出了一种新的加权得分矩阵直方图特征表示方法;针对训练数据的极端不平衡,结合下采样和分类器集成方法,训练随机森林集成分类器. 相对于传统的特征,该文所提新特征具有更低的维数,同时拥有更好的鉴别性. 分类器集成则缓解了下采样带来的信息丢失,并提高了分类精度. 实验结果验证了所述方法是有效的,在标准数据集上的结果优于其他最新的蛋白质相互作用位点预测方法.
展开▼