首页> 外文会议>International Conference on Parallel and Distributed Processing Techniques and Applications >Efficient Parallel Data Mining for Massive Datasets: Parallel Random Forests Classifier
【24h】

Efficient Parallel Data Mining for Massive Datasets: Parallel Random Forests Classifier

机译:用于大规模数据集的高效并行数据挖掘:并行随机林分类器

获取原文

摘要

Data mining refers to the process of finding hidden patterns inside a large dataset. While improving the accuracy of those algorithms has been the main focus of past research, massive dataset size imposes another challenge. Parallel and distributed processing techniques have been applied to data mining algorithms to make them scalable. In this paper, we discuss a new emerging data mining algorithm, random forests, and its parallelization based on VCluster, a portable parallel runtime system we have developed for a cluster of multiprocessors. Random forests is an ensemble of many decision trees and the classification is performed by majority voting by those decision trees. We also present the experimental results on the performance of parallel random forests approach.
机译:数据挖掘是指在大型数据集中找到隐藏模式的过程。虽然提高了这些算法的准确性一直是过去研究的主要重点,但大规模的数据集大小施加了另一个挑战。并行和分布式处理技术已应用于数据挖掘算法以使其可扩展。在本文中,我们讨论了一种新兴的数据挖掘算法,随机林及其基于VCLUSTER的并行化,我们为多处理器集群开发了一个便携式的并行运行时系统。随机森林是许多决策树的集成,通过这些决策树的大多数投票来进行分类。我们还介绍了对平行随机森林方法的性能的实验结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号