首页> 外文期刊>Journal of Parallel and Distributed Computing >Middleware for data mining applications on clusters and grids
【24h】

Middleware for data mining applications on clusters and grids

机译:集群和网格上用于数据挖掘应用程序的中间件

获取原文

摘要

This paper gives an overview of two middleware systems that have been developed over the last 6 years to address the challenges involved in developing parallel and distributed implementations of data mining algorithms. FREERIDE (FRamework for Rapid Implementation of Data mining Engines) focuses on data mining in a cluster environment. FREERIDE is based on the observation that parallel versions of several well-known data mining techniques share a relatively similar structure, and can be parallelized by dividing the data instances (or records or transactions) among the nodes. The computation on each node involves reading the data instances in an arbitrary order, processing each data instance, and performing a local reduction. The reduction involves only commutative and associative operations, which means the result is independent of the order in which the data instances are processed. After the local reduction on each node, a global reduction is performed. This similarity in the structure can be exploited by the middleware system to execute the data mining tasks efficiently in parallel, starting from a relatively high-level specification of the technique.rnTo enable processing of data sets stored in remote data repositories, we have extended FREERIDE middleware into FREERIDE-G (FRamework for Rapid Implementation of Data mining Engines in Grid). FREERIDE-G supports a high-level interface for developing data mining and scientific data processing applications that involve data stored in remote repositories. The added functionality in FREERIDE-G aims at abstracting the details of remote data retrieval, movements, and caching from application developers.
机译:本文概述了过去六年中开发的两个中间件系统,以解决开发数据挖掘算法的并行和分布式实现所涉及的挑战。 FREERIDE(用于快速实施数据挖掘引擎的框架)专注于集群环境中的数据挖掘。 FREERIDE基于以下观察结果:几种众所周知的数据挖掘技术的并行版本共享相对相似的结构,并且可以通过在节点之间划分数据实例(或记录或事务)来并行化。每个节点上的计算都涉及以任意顺序读取数据实例,处理每个数据实例并执行局部归约。减少只涉及交换和关联运算,这意味着结果与数据实例的处理顺序无关。在每个节点上进行局部归约之后,将执行全局归约。中间件系统可以利用这种结构上的相似性,从该技术的相对较高的规范开始,并行高效地并行执行数据挖掘任务。为了支持处理存储在远程数据存储库中的数据集,我们扩展了FREERIDE中间件进入FREERIDE-G(在网格中快速实现数据挖掘引擎的框架)。 FREERIDE-G支持高级界面,用于开发数据挖掘和科学数据处理应用程序,其中涉及存储在远程存储库中的数据。 FREERIDE-G中增加的功能旨在抽象化应用程序开发人员的远程数据检索,移动和缓存的详细信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号