首页> 外文期刊>Concurrency and computation: practice and experience >Parallelizing the execution of native data mining algorithms for computational biology
【24h】

Parallelizing the execution of native data mining algorithms for computational biology

机译:并行执行用于计算生物学的本机数据挖掘算法

获取原文
获取原文并翻译 | 示例

摘要

Data mining is being increasingly used in biology. Biologists are adopting prototyping languages, like Rrnand Matlab, to facilitate the application of data mining algorithms to their data. As a result, their scripts arernbecoming increasingly complex and also require frequent updates. Application to large datasets becomesrnimpractical and the time-to-paper increases. Furthermore, even if there are various systems that can be usedrnto efficiently process large datasets, for example, using Cloud and High Performance Computing, they usuallyrnrequire procedures to be translated into specific languages or to be adapted to a certain computingrnplatform. Such modifications can speed up the processing, but translation is not automatic, especially inrncomplex cases, and can require a large amount of programming effort and accurate validation. In this paper,rnwe propose an approach to parallelize data mining procedures in the form of compiled software or R scriptsrndeveloped by biology communities of practice. Our approach requires minimal alteration of the originalrncode. In many cases, there is no need for code modification. Furthermore, it allows for fast updating whenrna new version is ready. We clarify the constraints and the benefits of our method and report a practical userncase to demonstrate such benefits compared with a standard execution. Our approach relies on a distributedrnnetwork of web services and ultimately exposes the algorithms as-a-Service, to be invoked by remote thinrnclients.
机译:数据挖掘正越来越多地用于生物学中。生物学家正在采用原型语言,例如Rrnand Matlab,以促进将数据挖掘算法应用于其数据。结果,它们的脚本变得越来越复杂,并且需要经常更新。将其应用于大型数据集变得不切实际,并且缩短了论文撰写时间。此外,即使存在各种可用于例如使用云和高性能计算来有效处理大型数据集的系统,它们通常也需要将过程翻译成特定语言或适应于特定的计算平台。这样的修改可以加快处理速度,但是翻译不是自动的,尤其是在复杂的情况下,并且可能需要大量的编程工作和准确的验证。在本文中,我们提出了一种以生物学实践团体开发的编译软件或R脚本形式并行化数据挖掘过程的方法。我们的方法要求对原始码进行最少的更改。在许多情况下,无需修改代码。此外,当新版本准备就绪时,它允许快速更新。我们弄清了我们方法的局限性和好处,并报告了一个实际的用例,以证明与标准执行相比的好处。我们的方法依赖于Web服务的分布式网络,并最终将算法作为服务公开,由远程瘦客户端调用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号