首页> 外文OA文献 >A MapReduce based parallel SVM for large-scale predicting protein-protein interactions
【2h】

A MapReduce based parallel SVM for large-scale predicting protein-protein interactions

机译:基于MapReduce的并行SVM,可大规模预测蛋白质-蛋白质相互作用

摘要

Protein-protein interactions (PPIs) are crucial to most biochemical processes, including metabolic cycles, DNA transcription and replication, and signaling cascades. Although large amount of protein-protein interaction data for different species has been generated by high-throughput experimental techniques, the number is still limited compared to the total number of possible PPIs. Furthermore, the experimental methods for identifying PPIs are both time-consuming and expensive. Therefore, it is urgent and challenging to develop automated computational methods to efficiently and accurately predict PPIs. In this article, we propose a novel MapReduce-based parallel SVM model for large-scale predicting protein-protein interactions only using the information of protein sequences. First, the local sequential features represented by autocorrelation descriptor are extracted from protein sequences. Then the MapReduce framework is employed to train support vector machine (SVM) classifiers in a distributed way, obtaining significant improvement in training time while maintaining a high level of accuracy. The experimental results demonstrate that the proposed parallel algorithms not only can tackle large-scale PPIs dataset, but also perform well in terms of the evaluation metrics of speedup and accuracy. Consequently, the proposed approach can be considered as a new promising and powerful tools for large-scale predicting PPI with excellent performance and less time.
机译:蛋白质-蛋白质相互作用(PPI)对于大多数生化过程至关重要,包括代谢循环,DNA转录和复制以及信号级联。尽管通过高通量实验技术已经获得了不同物种的大量蛋白质-蛋白质相互作用数据,但与可能的PPI总数相比,该数目仍然有限。此外,用于识别PPI的实验方法既耗时又昂贵。因此,迫切需要开发自动化计算方法来有效,准确地预测PPI。在本文中,我们提出了一种基于MapReduce的新型并行SVM模型,仅使用蛋白质序列信息即可大规模预测蛋白质-蛋白质相互作用。首先,从蛋白质序列中提取由自相关描述符表示的局部顺序特征。然后,使用MapReduce框架以分布式方式训练支持向量机(SVM)分类器,从而在保持较高准确性的同时,显着改善了训练时间。实验结果表明,所提出的并行算法不仅可以处理大规模的PPI数据集,而且在加速和准确性的评估指标上表现良好。因此,所提出的方法可以被认为是用于以优异的性能和更少的时间大规模预测PPI的有前途和有力的工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号