Large-scale virtual screening on public cloud resources with Apache Spark

Marco Capuccini; Laeeq Ahmed; Wesley Schaal; Erwin Laure; Ola Spjuth

首页> 外文期刊>Journal of Cheminformatics >Large-scale virtual screening on public cloud resources with Apache Spark

【24h】

Large-scale virtual screening on public cloud resources with Apache Spark

机译：使用Apache Spark对公共云资源进行大规模虚拟筛选

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure rate hardware and fast network connection. Google’s MapReduce revolutionized large-scale analysis, enabling the processing of massive datasets on commodity hardware and cloud resources, providing transparent scalability and fault tolerance at the software level. Open source implementations of MapReduce include Apache Hadoop and the more recent Apache Spark. ResultsWe developed a method to run existing docking-based screening software on distributed cloud resources, utilizing the MapReduce approach. We benchmarked our method, which is implemented in Apache Spark, docking a publicly available target receptor against (sim ) 2.2?M compounds. The performance experiments show a good parallel efficiency (87%) when running in a public cloud environment. Conclusion Our method enables parallel Structure-based virtual screening on public cloud resources or commodity computer clusters. The degree of scalability that we achieve allows for trying out our method on relatively small libraries first and then to scale to larger libraries. Our implementation is named Spark-VS and it is freely available as open source from GitHub ( https://github.com/mcapuccini/spark-vs ). Open image in new window Graphical abstract .

机译：背景技术基于结构的虚拟筛选是一种针对虚拟分子文库筛选靶受体的计算机内方法。将基于对接的筛选应用于大型分子库可能在计算上昂贵，但它构成了可并行化的任务。大多数可用的并行实现基于消息传递接口，它们依赖于低故障率硬件和快速的网络连接。 Google的MapReduce彻底革新了大规模分析，可以处理商品硬件和云资源上的海量数据集，并在软件级别提供透明的可伸缩性和容错能力。 MapReduce的开源实现包括Apache Hadoop和更新的Apache Spark。结果我们开发了一种利用MapReduce方法在分布式云资源上运行现有基于对接的筛选软件的方法。我们对在Apache Spark中实现的方法进行了基准测试，将可公开获得的目标受体与（ sim ）2.2？M化合物对接。性能实验表明，在公共云环境中运行时，并行效率很高（87％）。结论我们的方法可以对公共云资源或商用计算机集群进行基于结构的并行虚拟筛选。我们达到的可扩展性程度允许先在相对较小的库中试用我们的方法，然后再扩展到较大的库。我们的实现名为Spark-VS，可从GitHub（https://github.com/mcapuccini/spark-vs）作为开源免费获得。在新窗口中打开图像图形摘要。

著录项

来源
《Journal of Cheminformatics》 |2017年第1期|共1页
作者
Marco Capuccini; Laeeq Ahmed; Wesley Schaal; Erwin Laure; Ola Spjuth;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类化学;
关键词
入库时间 2022-08-18 16:15:06

相似文献

外文文献
中文文献
专利

1. Large-scale virtual screening experiments on Windows Azure-based cloud resources [J] . Kiss Tamas, Borsody Peter, Terstyanszky Gabor, Concurrency and computation: practice and experience . 2014,第10期

机译：基于Windows Azure的云资源的大规模虚拟筛选实验
2. Distributed heterogeneous ensemble learning on Apache Spark for ligand-based virtual screening [J] . Sid Karima, Batouche Mohamed International journal of data mining, modelling and management . 2021,第1a2期

机译：基于配体的虚拟筛选的Apache Spark上分布式异构集合学习
3. Tuning configuration of apache spark on public clouds by combining multi-objective optimization and performance prediction model [J] . Guoli Cheng, Shi Ying, Bingming Wang The Journal of Systems and Software . 2021,第Octa期

机译：通过组合多目标优化和性能预测模型，调整公共云上的Apache Spark的配置
4. Ensemble Learning for Large Scale Virtual Screening on Apache Spark [C] . Karima Sid, Mohamed Batouche Computational intelligence and its applications . 2018

机译：在Apache Spark上进行大规模虚拟筛选的集成学习
5. Resource-efficient Management of Large-scale Public Cloud Systems =?????? ?????? ???????? ?????? ???? ????? ?? ????? ???? [D] . Shahrad, Mohammad. 2020

机译：大规模公共云系统的资源有效管理= ?????? ?????? ???????? ?????? ???? ?????当????? ????
6. Large-scale virtual screening on public cloud resources with Apache Spark [O] . Marco Capuccini, Laeeq Ahmed, Wesley Schaal, 2017

机译：使用Apache Spark对公共云资源进行大规模虚拟筛选
7. Large-scale virtual screening on public cloud resources with Apache Spark [O] . 2017

机译：使用Apache Spark对公共云资源进行大规模虚拟筛选

Large-scale virtual screening on public cloud resources with Apache Spark

摘要

著录项

相似文献

相关主题

期刊订阅