首页> 外文期刊>Journal of Cheminformatics >Large-scale virtual screening on public cloud resources with Apache Spark
【24h】

Large-scale virtual screening on public cloud resources with Apache Spark

机译:使用Apache Spark对公共云资源进行大规模虚拟筛选

获取原文
       

摘要

Background Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure rate hardware and fast network connection. Google’s MapReduce revolutionized large-scale analysis, enabling the processing of massive datasets on commodity hardware and cloud resources, providing transparent scalability and fault tolerance at the software level. Open source implementations of MapReduce include Apache Hadoop and the more recent Apache Spark. ResultsWe developed a method to run existing docking-based screening software on distributed cloud resources, utilizing the MapReduce approach. We benchmarked our method, which is implemented in Apache Spark, docking a publicly available target receptor against (sim ) 2.2?M compounds. The performance experiments show a good parallel efficiency (87%) when running in a public cloud environment. Conclusion Our method enables parallel Structure-based virtual screening on public cloud resources or commodity computer clusters. The degree of scalability that we achieve allows for trying out our method on relatively small libraries first and then to scale to larger libraries. Our implementation is named Spark-VS and it is freely available as open source from GitHub ( https://github.com/mcapuccini/spark-vs ). Open image in new window Graphical abstract .
机译:背景技术基于结构的虚拟筛选是一种针对虚拟分子文库筛选靶受体的计算机内方法。将基于对接的筛选应用于大型分子库可能在计算上昂贵,但它构成了可并行化的任务。大多数可用的并行实现基于消息传递接口,它们依赖于低故障率硬件和快速的网络连接。 Google的MapReduce彻底革新了大规模分析,可以处理商品硬件和云资源上的海量数据集,并在软件级别提供透明的可伸缩性和容错能力。 MapReduce的开源实现包括Apache Hadoop和更新的Apache Spark。结果我们开发了一种利用MapReduce方法在分布式云资源上运行现有基于对接的筛选软件的方法。我们对在Apache Spark中实现的方法进行了基准测试,将可公开获得的目标受体与( sim )2.2?M化合物对接。性能实验表明,在公共云环境中运行时,并行效率很高(87%)。结论我们的方法可以对公共云资源或商用计算机集群进行基于结构的并行虚拟筛选。我们达到的可扩展性程度允许先在相对较小的库中试用我们的方法,然后再扩展到较大的库。我们的实现名为Spark-VS,可从GitHub(https://github.com/mcapuccini/spark-vs)作为开源免费获得。在新窗口中打开图像图形摘要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号