...
首页> 外文期刊>BMC Bioinformatics >Integrative approaches to the prediction of protein functions based on the feature selection
【24h】

Integrative approaches to the prediction of protein functions based on the feature selection

机译:基于特征选择的蛋白质功能预测的综合方法

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Background Protein function prediction has been one of the most important issues in functional genomics. With the current availability of various genomic data sets, many researchers have attempted to develop integration models that combine all available genomic data for protein function prediction. These efforts have resulted in the improvement of prediction quality and the extension of prediction coverage. However, it has also been observed that integrating more data sources does not always increase the prediction quality. Therefore, selecting data sources that highly contribute to the protein function prediction has become an important issue. Results We present systematic feature selection methods that assess the contribution of genome-wide data sets to predict protein functions and then investigate the relationship between genomic data sources and protein functions. In this study, we use ten different genomic data sources in Mus musculus , including: protein-domains, protein-protein interactions, gene expressions, phenotype ontology, phylogenetic profiles and disease data sources to predict protein functions that are labelled with Gene Ontology (GO) terms. We then apply two approaches to feature selection: exhaustive search feature selection using a kernel based logistic regression (KLR), and a kernel based L1 -norm regularized logistic regression (KL1LR). In the first approach, we exhaustively measure the contribution of each data set for each function based on its prediction quality. In the second approach, we use the estimated coefficients of features as measures of contribution of data sources. Our results show that the proposed methods improve the prediction quality compared to the full integration of all data sources and other filter-based feature selection methods. We also show that contributing data sources can differ depending on the protein function. Furthermore, we observe that highly contributing data sets can be similar among a group of protein functions that have the same parent in the GO hierarchy. Conclusions In contrast to previous integration methods, our approaches not only increase the prediction quality but also gather information about highly contributing data sources for each protein function. This information can help researchers collect relevant data sources for annotating protein functions.
机译:背景蛋白功能预测一直是功能基因组学中最重要的问题之一。随着各种基因组数据集的当前可用性,许多研究人员已尝试开发整合模型,该模型将所有可用的基因组数据结合在一起以进行蛋白质功能预测。这些努力导致了预测质量的提高和预测范围的扩展。但是,也已经观察到,集成更多的数据源并不总是可以提高预测质量。因此,选择对蛋白质功能预测有重大贡献的数据源已经成为一个重要的问题。结果我们提供了系统的特征选择方法,这些方法评估了全基因组数据集对预测蛋白质功能的贡献,然后研究了基因组数据源与蛋白质功能之间的关系。在这项研究中,我们使用小家鼠中的十种不同的基因组数据源,包括:蛋白质结构域,蛋白质-蛋白质相互作用,基因表达,表型本体论,系统发育谱和疾病数据源,以预测被基因本体论(GO)标记的蛋白质功能)条款。然后,我们将两种方法应用于特征选择:使用基于内核的逻辑回归(KLR)进行详尽的搜索特征选择,以及基于内核的L1范数正则化逻辑回归(KL1LR)。在第一种方法中,我们基于其功能的预测质量详尽地测量了每个数据集对每个功能的贡献。在第二种方法中,我们使用估计的特征系数作为数据源贡献的度量。我们的结果表明,与所有数据源和其他基于过滤器的特征选择方法的完全集成相比,所提出的方法提高了预测质量。我们还表明,贡献的数据源可能会因蛋白质功能而异。此外,我们观察到,在GO层次结构中具有相同亲本的一组蛋白质功能中,具有高度贡献的数据集可能相似。结论与以前的整合方法相比,我们的方法不仅提高了预测质量,而且还收集了有关每种蛋白质功能的高贡献数据源的信息。这些信息可以帮助研究人员收集用于注释蛋白质功能的相关数据源。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号