...
首页> 外文期刊>Current Bioinformatics >An Empirical Study of Features Fusion Techniques for Protein-Protein Interaction Prediction
【24h】

An Empirical Study of Features Fusion Techniques for Protein-Protein Interaction Prediction

机译:特征融合技术用于蛋白质-蛋白质相互作用预测的实证研究

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

With recent development of bioinformatics, the importance of understanding protein function has been widely acknowledged. Most proteins perform their functions by interacting with other proteins. Hence, it is urgent to explore the protein-protein interaction (PPI). At present, the prediction of PPIs is still a tough problem. Despite the fact that a variety of computational methods have been proposed to identify PPIs; unfortunately, most of them are complex and with low accuracy. Traditional methods extract features following two steps: firstly, they extract features from two proteins of a PPI; secondly, they regard two features as strings, and do concatenation operator. Concatenation is an outcome of an addition operation on strings. The concatenation operator increases redundancy features with the result being associated with the order of concatenation. Based on this, in this paper, we study the features fusion and features selection. The presented framework consists of three stages: in the first stage, we get the negative data set from off-the-shelf database. The reliability of negative data set of previous studies has not been of concern to us. While in the second stage, the n-gram frequency method was used to preprocess the PPIs sequences. The third one was applied to splice the final feature, and then the features were selected to find the optimal feature. Finally, an effective parameter for the Random Forest Classifier was selected. Experiments carried out on real data set showed that our features fusion method outperformed traditional methods in terms of protein-protein interaction prediction. The encouraging results can be helpful for future research in protein function.
机译:随着生物信息学的最新发展,广泛理解蛋白质功能的重要性。大多数蛋白质通过与其他蛋白质相互作用来执行其功能。因此,迫切需要探索蛋白质-蛋白质相互作用(PPI)。目前,PPI的预测仍然是一个难题。尽管已经提出了多种计算方法来识别PPI;不幸的是,它们中的大多数都是复杂的且准确性较低。传统方法通过以下两个步骤提取特征:首先,它们从PPI的两种蛋白质中提取特征;其次,他们将两个特征视为字符串,并进行串联运算。串联是对字符串进行加法运算的结果。串联运算符增加了冗余功能,其结果与串联顺序相关联。基于此,本文研究了特征融合和特征选择。提出的框架包括三个阶段:在第一阶段,我们从现成的数据库中获取负面数据集。先前研究的阴性数据集的可靠性并未引起我们的关注。在第二阶段中,使用n元语法频率方法对PPI序列进行预处理。应用第三个拼接最终特征,然后选择特征以找到最佳特征。最后,选择了随机森林分类器的有效参数。在真实数据集上进行的实验表明,我们的特征融合方法在蛋白质-蛋白质相互作用预测方面优于传统方法。令人鼓舞的结果可能有助于蛋白质功能的进一步研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号