首页> 外文会议>IEEE International Conference on Data Mining Workshops >Parallelizing an Information Theoretic Co-clustering Algorithm Using a Cloud Middleware
【24h】

Parallelizing an Information Theoretic Co-clustering Algorithm Using a Cloud Middleware

机译:使用云中间件并行化信息理论共聚类算法

获取原文

摘要

The emerging cloud environments are well suited for storage and analysis of large datasets, since they can allow on-demand access to resources. However, developing high-performance implementations of data analysis tasks is a challenging problem. In our prior work, we have developed a middleware called FREERIDE (FRamework for Rapid Implementation of Data mining Engines). FREERIDE is based upon the observation that the processing structure of a large number of data mining algorithms involves generalized reductions. FREERIDE offers a high-level interface and implements both distributed memory and shared memory parallelization. In this paper, we consider a challenging new data mining algorithm, information theoretic co-clustering, and parallelize it using the FREERIDE middleware. We show how the main processing loops of row clustering and column clustering of the Co-clustering algorithm can essentially be fit into a generalized reduction structure. We achieve good parallel efficiency, with a speedup of nearly 21 on 32 cores.
机译:新兴云环境非常适合存储和分析大型数据集,因为它们可以允许按需访问资源。然而,开发数据分析任务的高性能实现是一个具有挑战性的问题。在我们之前的工作中,我们开发了一种名为Freeride的中间件(数据挖掘发动机快速实施的框架)。 Freeride基于观察到大量数据挖掘算法的处理结构涉及广义减少。 Freeride提供高级接口,实现分布式内存和共享内存并行化。在本文中,我们考虑了一个具有挑战性的新数据挖掘算法,信息理论共聚类,并使用Freeride中间件并行化。我们展示了共聚算法的行聚类和列聚类的主要处理循环如何基本上适合于广义还原结构。我们达到了良好的平行效率,加速了32个核心近21核。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号