首页> 外文期刊>International journal of data mining, modelling and management >Distributed heterogeneous ensemble learning on Apache Spark for ligand-based virtual screening
【24h】

Distributed heterogeneous ensemble learning on Apache Spark for ligand-based virtual screening

机译:基于配体的虚拟筛选的Apache Spark上分布式异构集合学习

获取原文
获取原文并翻译 | 示例

摘要

Virtual screening is one of the most common computer-aided drug design techniques that apply computational tools and methods on large libraries of molecules to extract the drugs. Ensemble learning is a recent paradigm launched to improve machine learning results in terms of predictive performance and robustness. It has been successfully applied in ligand-based virtual screening (LBVS) approaches. Applying ensemble learning on huge molecular libraries is computationally expensive. Hence, the distribution and parallelisation of the task have become a significant step by using sophisticated frameworks such as Apache Spark. In this paper, we propose a new approach HEnsL_DLBVS, for heterogeneous ensemble learning, distributed on Spark to improve the large-scale LBVS results. To handle the problem of imbalanced big training datasets, we propose a novel hybrid technique. We generate new training datasets to evaluate the approach. Experimental results confirm the effectiveness of our approach with satisfactory accuracy and its superiority over homogeneous models.
机译:虚拟筛选是最常见的计算机辅助药物设计技术之一,适用在大型分子文库上应用计算工具和方法以提取药物。集合学习是最近推出的范式,以改善机器学习在预测性能和稳健性方面的结果。它已成功应用于基于配体的虚拟筛选(LBV)方法。在巨大的分子库上应用集合学习是计算昂贵的。因此,任务的分布和平行于通过使用诸如Apache Spark等复杂的框架成为重要的一步。在本文中,我们提出了一种新的方法HENSL_DLBV,用于异构集合学习,分布在火花上,以提高大规模的LBVS结果。为了处理更加培训数据集的不平衡问题,我们提出了一种新颖的混合技术。我们生成新的培训数据集以评估方法。实验结果证实了我们对令人满意的精度及其在均匀模型的优越性的效果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号