Parallelizing the execution of native data mining algorithms for computational biology

Gianpaolo Coro; Leonardo Candela; Pasquale Pagano; Angela Italiano; Loredana Liccardo

首页> 外文期刊>Concurrency and computation: practice and experience >Parallelizing the execution of native data mining algorithms for computational biology

【24h】

Parallelizing the execution of native data mining algorithms for computational biology

机译：并行执行用于计算生物学的本机数据挖掘算法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Data mining is being increasingly used in biology. Biologists are adopting prototyping languages, like Rrnand Matlab, to facilitate the application of data mining algorithms to their data. As a result, their scripts arernbecoming increasingly complex and also require frequent updates. Application to large datasets becomesrnimpractical and the time-to-paper increases. Furthermore, even if there are various systems that can be usedrnto efficiently process large datasets, for example, using Cloud and High Performance Computing, they usuallyrnrequire procedures to be translated into specific languages or to be adapted to a certain computingrnplatform. Such modifications can speed up the processing, but translation is not automatic, especially inrncomplex cases, and can require a large amount of programming effort and accurate validation. In this paper,rnwe propose an approach to parallelize data mining procedures in the form of compiled software or R scriptsrndeveloped by biology communities of practice. Our approach requires minimal alteration of the originalrncode. In many cases, there is no need for code modification. Furthermore, it allows for fast updating whenrna new version is ready. We clarify the constraints and the benefits of our method and report a practical userncase to demonstrate such benefits compared with a standard execution. Our approach relies on a distributedrnnetwork of web services and ultimately exposes the algorithms as-a-Service, to be invoked by remote thinrnclients.

机译：数据挖掘正越来越多地用于生物学中。生物学家正在采用原型语言，例如Rrnand Matlab，以促进将数据挖掘算法应用于其数据。结果，它们的脚本变得越来越复杂，并且需要经常更新。将其应用于大型数据集变得不切实际，并且缩短了论文撰写时间。此外，即使存在各种可用于例如使用云和高性能计算来有效处理大型数据集的系统，它们通常也需要将过程翻译成特定语言或适应于特定的计算平台。这样的修改可以加快处理速度，但是翻译不是自动的，尤其是在复杂的情况下，并且可能需要大量的编程工作和准确的验证。在本文中，我们提出了一种以生物学实践团体开发的编译软件或R脚本形式并行化数据挖掘过程的方法。我们的方法要求对原始码进行最少的更改。在许多情况下，无需修改代码。此外，当新版本准备就绪时，它允许快速更新。我们弄清了我们方法的局限性和好处，并报告了一个实际的用例，以证明与标准执行相比的好处。我们的方法依赖于Web服务的分布式网络，并最终将算法作为服务公开，由远程瘦客户端调用。

著录项

来源
《Concurrency and computation: practice and experience》 |2015年第17期|4630-4644|共15页
作者
Gianpaolo Coro; Leonardo Candela; Pasquale Pagano; Angela Italiano; Loredana Liccardo;
展开▼
作者单位

Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo” – CNR via G.Moruzzi, 1 – 56124, Pisa, Italy;

Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo” – CNR, Pisa, Italy;

Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo” – CNR, Pisa, Italy;

Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo” – CNR, Pisa, Italy;

Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo” – CNR, Pisa, Italy;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
data mining; parallel processing; cloud computing; computational biology; distributed systems; R;

机译：数据挖掘;并行处理;云计算;计算生物学;分布式系统;R;

相似文献

外文文献
中文文献
专利

1. High Performance Computation of Big Data: Performance Optimization Approach towards a Parallel Frequent Item Set Mining Algorithm for Transaction Data based on Hadoop MapReduce Framework [J] . Guru Prasad M S, Nagesh H R, Swathi Prabhu International Journal of Intelligent Systems and Applications . 2017,第1期

机译：大数据的高性能计算：基于Hadoop MapReduce框架的事务数据并行频繁项集挖掘算法的性能优化方法
2. Simultaneous CPU-GPU Execution of Data Parallel Algorithmic Skeletons [J] . Fabian Wrede, Steffen Ernsting International journal of parallel programming . 2018,第1期

机译：数据并行算法框架的同时CPU-GPU执行
3. Scalable Heuristic Algorithms for the Parallel Execution of Data Flow Acyclic Digraphs [J] . Zeyao Mo, Aiqing Zhang, Gabriel Wittum SIAM Journal on Scientific Computing . 2010,第5期

机译：并行执行数据流非循环有向图的可扩展启发式算法
4. Data Mining Algorithms Parallelization in Logic Programming Framework for Execution in Cluster [C] . Aleksey Malov, Sergey Rodionov, Andrey Shorov International Conference on Next Generation Wired/Wireless Networking;Conference on Internet of Things and Smart Spaces . 2019

机译：集群中执行逻辑编程框架中的数据挖掘算法并行化
5. Parallel algorithms and software for time-dependent systems of nonlinear partial differential equations with an application in computational biology. [D] . Murillo, Maria Silva. 2002

机译：非线性偏微分方程时间相关系统的并行算法和软件及其在计算生物学中的应用。
6. Scalable Data Mining Algorithms in Computational Biology and Biomedicine [O] . Quan Zou, Dariusz Mrozek, Qin Ma, 2006

机译：计算生物学和生物医学中的可扩展数据挖掘算法
7. Scalable Data Mining Algorithms in Computational Biology and Biomedicine [O] . Quan Zou, Dariusz Mrozek, Qin Ma, 2017

机译：计算生物学和生物医学中的可扩展数据挖掘算法
8. Some computational challenges of developing efficient parallel algorithms for data-dependent computations in thermal-hydraulics supercomputer applications [R] . Woodruff, S B 1992

机译：在热工水力学超级计算机应用中开发用于数据相关计算的高效并行算法的一些计算挑战

Parallelizing the execution of native data mining algorithms for computational biology

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅