首页> 外文会议>International Conference on Parallel Computing Technologies >Parallelization of Algorithms for Mining Data from Distributed Sources
【24h】

Parallelization of Algorithms for Mining Data from Distributed Sources

机译:分布式数据挖掘算法的并行化

获取原文

摘要

We suggest an approach to optimize data mining in modem applications that work on distributed data. We formally transform a high-level functional representation of a data-mining algorithm into a parallel implementation that performs as much as possible computations locally at the data sources, rather than accumulating all data for processing at a central location as in the traditional MapReduce approach. Our approach avoids the main disadvantages of the state-of-the-art MapReduce frameworks in the context of distributed data: increased run time, high network traffic, and an unauthorized access to data. We use the popular data-mining algorithm - Naive Bayes - for illustrating our approach and evaluating it experimentally. Our experiments confirm that the implementation of Naive Bayes developed by using our approach significantly outperforms the traditional MapReduce-based implementation regarding the run time and the network traffic.
机译:我们建议一种在分布式数据上工作的调制解调器应用程序中优化数据挖掘的方法。我们将数据挖掘算法的高级功能表示形式正式转换为并行实现,该并行实现在数据源中本地执行尽可能多的计算,而不是像传统的MapReduce方法那样在中央位置累积所有数据以进行处理。我们的方法避免了分布式数据环境中最新的MapReduce框架的主要缺点:运行时间增加,网络流量大以及对数据的未经授权的访问。我们使用流行的数据挖掘算法-Naive Bayes-说明我们的方法并进行实验评估。我们的实验证实,使用我们的方法开发的Naive Bayes的实现在运行时间和网络流量方面明显优于基于MapReduce的传统实现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号