首页> 外文期刊>NanoBioscience, IEEE Transactions on >Constructing Query-Driven Dynamic Machine Learning Model With Application to Protein-Ligand Binding Sites Prediction
【24h】

Constructing Query-Driven Dynamic Machine Learning Model With Application to Protein-Ligand Binding Sites Prediction

机译:构建查询驱动的动态机器学习模型及其在蛋白质配体结合位点预测中的应用

获取原文
获取原文并翻译 | 示例
       

摘要

We are facing an era with annotated biological data rapidly and continuously generated. How to effectively incorporate new annotated data into the learning step is crucial for enhancing the performance of a bioinformatics prediction model. Although machine-learning-based methods have been extensively used for dealing with various biological problems, existing approaches usually train static prediction models based on fixed training datasets. The static approaches are found having several disadvantages such as low scalability and impractical when training dataset is huge. In view of this, we propose a dynamic learning framework for constructing query-driven prediction models. The key difference between the proposed framework and the existing approaches is that the training set for the machine learning algorithm of the proposed framework is dynamically generated according to the query input, as opposed to training a general model regardless of queries in traditional static methods. Accordingly, a query-driven predictor based on the smaller set of data specifically selected from the entire annotated base dataset will be applied on the query. The new way for constructing the dynamic model enables us capable of updating the annotated base dataset flexibly and using the most relevant core subset as the training set makes the constructed model having better generalization ability on the query, showing “part could be better than all” phenomenon. According to the new framework, we have implemented a dynamic protein-ligand binding sites predictor called OSML (On-site model for ligand binding sites prediction). Computer experiments on 10 different ligand types of three hierarchically organized levels show that OSML outperforms most existing predictors. The results indicate that the current dynamic framework is a promising future direction for bridging the gap between the rapidly accumulated annotated biological data and the effective machine-learning-based pre- ictors. OSML web server and datasets are freely available at: http://www.csbio.sjtu.edu.cn/bioinf/OSML/ for academic use.
机译:我们正面临着一个快速而连续地生成带注释的生物学数据的时代。如何有效地将新的带注释的数据合并到学习步骤中,对于增强生物信息学预测模型的性能至关重要。尽管基于机器学习的方法已广泛用于处理各种生物学问题,但是现有方法通常基于固定的训练数据集来训练静态预测模型。发现静态方法具有几个缺点,例如可伸缩性低,并且在训练数据集庞大时不切实际。有鉴于此,我们提出了一种动态学习框架,用于构建查询驱动的预测模型。所提出的框架与现有方法之间的主要区别在于,所提出的框架的机器学习算法的训练集是根据查询输入动态生成的,这与训练传统模型而不考虑传统静态方法中的查询相反。因此,基于查询的预测器将基于从整个带注释的基础数据集中特别选择的较小数据集而应用于查询。构造动态模型的新方法使我们能够灵活地更新带注释的基础数据集,并使用最相关的核心子集作为训练集,从而使构造的模型在查询上具有更好的泛化能力,表明“部分可能比所有人都更好”现象。根据新框架,我们已经实现了称为OSML(配体结合位点预测的现场模型)的动态蛋白质-配体结合位点预测器。在三个分层组织的级别上对10种不同的配体类型进行的计算机实验表明,OSML优于大多数现有的预测因子。结果表明,当前的动态框架是弥合快速积累的带注释的生物学数据与有效的基于机器学习的预测器之间差距的有希望的未来方向。 OSML Web服务器和数据集可从以下网址免费获得:http://www.csbio.sjtu.edu.cn/bioinf/OSML/,以用于学术用途。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号