首页> 外文期刊>BMC Bioinformatics >Performance improvement for a 2D convolutional neural network by using SSC encoding on protein–protein interaction tasks
【24h】

Performance improvement for a 2D convolutional neural network by using SSC encoding on protein–protein interaction tasks

机译:通过使用SSC编码对蛋白质 - 蛋白质互动任务进行2D卷积神经网络的性能改进

获取原文
       

摘要

The interactions of proteins are determined by their sequences and affect the regulation of the cell cycle, signal transduction and metabolism, which is of extraordinary significance to modern proteomics research. Despite advances in experimental technology, it is still expensive, laborious, and time-consuming to determine protein–protein interactions (PPIs), and there is a strong demand for effective bioinformatics approaches to identify potential PPIs. Considering the large amount of PPI data, a high-performance processor can be utilized to enhance the capability of the deep learning method and directly predict protein sequences. We propose the Sequence-Statistics-Content protein sequence encoding format (SSC) based on information extraction from the original sequence for further performance improvement of the convolutional neural network. The original protein sequences are encoded in the three-channel format by introducing statistical information (the second channel) and bigram encoding information (the third channel), which can increase the unique sequence features to enhance the performance of the deep learning model. On predicting protein–protein interaction tasks, the results using the 2D convolutional neural network (2D CNN) with the SSC encoding method are better than those of the 1D CNN with one hot encoding. The independent validation of new interactions from the HIPPIE database (version 2.1 published on July 18, 2017) and the validation of directly predicted results by applying a molecular docking tool indicate the effectiveness of the proposed protein encoding improvement in the CNN model. The proposed protein sequence encoding method is efficient at improving the capability of the CNN model on protein sequence-related tasks and may also be effective at enhancing the capability of other machine learning or deep learning methods. Prediction accuracy and molecular docking validation showed considerable improvement compared to the existing hot encoding method, indicating that the SSC encoding method may be useful for analyzing protein sequence-related tasks. The source code of the proposed methods is freely available for academic research at https://github.com/wangy496/SSC-format/ .
机译:蛋白质的相互作用由它们的序列测定,并影响细胞周期,信号转导和代谢的调节,这对现代蛋白质组学研究具有非凡的意义。尽管实验技术进展,但确定蛋白质 - 蛋白质相互作用(PPI)仍然昂贵,费力,耗时,并且对识别潜在PPI的有效生物信息学方法存在强烈需求。考虑到大量PPI数据,可以利用高性能处理器来增强深度学习方法的能力,直接预测蛋白质序列。我们提出了基于来自原始序列的信息提取的序列统计 - 内容蛋白质序列编码格式(SSC)以进一步改进卷积神经网络。原始蛋白质序列通过引入统计信息(第二信道)和Bigram编码信息(第三频道)来以三声道格式编码,这可以增加唯一的序列特征以增强深度学习模型的性能。在预测蛋白质 - 蛋白质相互作用任务中,使用具有SSC编码方法的2D卷积神经网络(2D CNN)的结果优于一个热编码的1D CNN的结果。自主验证来自Hippie数据库的新交互(2017年7月18日发布的版本2.1)和通过施加分子对接工具的直接预测结果验证表明CNN模型中提出的蛋白质编码改善的有效性。所提出的蛋白质序列编码方法是有效的,提高CNN模型对蛋白质序列相关任务的能力,并且还可以有效地提高其他机器学习或深度学习方法的能力。与现有的热编码方法相比,预测精度和分子对接验证显示了相当大的改进,表明SSC编码方法可用于分析蛋白质序列相关的任务。拟议方法的源代码在https://github.com/wangy496/ssc -format/上自由地提供学术研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号