首页> 外文期刊>BT Technology Journal >A distributed framewark for parallel data mining using HPJava
【24h】

A distributed framewark for parallel data mining using HPJava

机译:使用HPJava进行并行数据挖掘的分布式框架

获取原文
获取原文并翻译 | 示例
       

摘要

Java has become a language of choice for applications executing in heterogeneous environments utilising distributed objects and multithreading. To handle large data sets, scalable and efficient implementations of data mining approaches are required, generally employing computationally intensive algorithms. Conventional Java implementations do not directly provide support for the data structures often encountered in such algorithms, and they also lack repeatability in numerical precision across platforms. This paper describes a distributed framework employing task and data parallelism, and implemented in high performance Java (HPJava). Issues of interest for data mining algorithms are identified, and possible solutions discussed for overcoming limitations in the Java Virtual Machine. The framework supports parallelism across workstation clusters, using the message-passing interface as middleware, and can support different analysis algorithms, wrapped as Java objects, and linked to various databases using the Java database connectivity interface. Guidelines are provided for implementing parallel and distributed data mining on large data sets, and a proof-of-concept data mining application is analysed using a neural network.
机译:Java已成为在使用分布式对象和多线程的异构环境中执行的应用程序的选择语言。为了处理大型数据集,通常需要采用计算密集型算法,因此需要可伸缩且高效的数据挖掘方法实现。常规的Java实现不能直接为此类算法中经常遇到的数据结构提供支持,并且它们在跨平台的数值精度方面也缺乏可重复性。本文介绍了一种采用任务和数据并行性的分布式框架,并以高性能Java(HPJava)实现。确定了数据挖掘算法感兴趣的问题,并讨论了克服Java虚拟机限制的可能解决方案。该框架使用消息传递接口作为中间件,支持跨工作站集群的并行性,并且可以支持不同的分析算法,包装为Java对象,并使用Java数据库连接性接口链接到各种数据库。提供了在大型数据集上实现并行和分布式数据挖掘的指南,并使用神经网络分析了概念验证数据挖掘应用程序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号