Efficient Parallel Data Mining for Massive Datasets: Parallel Random Forests Classifier

机译：用于大规模数据集的高效并行数据挖掘：并行随机林分类器

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data mining refers to the process of finding hidden patterns inside a large dataset. While improving the accuracy of those algorithms has been the main focus of past research, massive dataset size imposes another challenge. Parallel and distributed processing techniques have been applied to data mining algorithms to make them scalable. In this paper, we discuss a new emerging data mining algorithm, random forests, and its parallelization based on VCluster, a portable parallel runtime system we have developed for a cluster of multiprocessors. Random forests is an ensemble of many decision trees and the classification is performed by majority voting by those decision trees. We also present the experimental results on the performance of parallel random forests approach.

机译：数据挖掘是指在大型数据集中找到隐藏模式的过程。虽然提高了这些算法的准确性一直是过去研究的主要重点，但大规模的数据集大小施加了另一个挑战。并行和分布式处理技术已应用于数据挖掘算法以使其可扩展。在本文中，我们讨论了一种新兴的数据挖掘算法，随机林及其基于VCLUSTER的并行化，我们为多处理器集群开发了一个便携式的并行运行时系统。随机森林是许多决策树的集成，通过这些决策树的大多数投票来进行分类。我们还介绍了对平行随机森林方法的性能的实验结果。

著录项

来源
《International Conference on Parallel and Distributed Processing Techniques and Applications》|2005年||共7页
会议地点
作者
Jianyong Dai; Joohan Lee; Morgan C. Wang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
Data mining; Parallel processing; Random forests; Cluster computing;

机译：数据挖掘;并行处理;随机森林;集群计算;

相似文献

外文文献
中文文献
专利

1. Mining massive datasets by an unsupervised parallel clustering on a GRID: Novel algorithms and case study [J] . Alberto Faro, Daniela Giordano, Francesco Maiorana Future generation computer systems . 2011,第6期

机译：通过GRID上的无监督并行聚类挖掘海量数据集：新算法和案例研究
2. ASTRAL-MP: scaling ASTRAL to very large datasets using randomization and parallelization [J] . Yin John, Zhang Chao, Mirarab Siavash Bioinformatics . 2019,第20期

机译：Astral-MP：使用随机化和并行化将星形缩放到非常大的数据集
3. An efficient parallel row enumerated algorithm for mining frequent colossal closed itemsets from high dimensional datasets [J] . Vanahalli Manjunath K., Patil Nagamma Information Sciences: An International Journal . 2019,第期

机译：一种有效的并行行枚举算法，用于从高维数据集频繁频繁的巨大闭合项集
4. Efficient Parallel Data Mining for Massive Datasets: Parallel Random Forests Classifier [C] . Jianyong Dai, Joohan Lee, Morgan C. Wang International Conference on Parallel and Distributed Processing Techniques and Applications(PDPTA'05) vol.3; 20050627-30; Las Vegas,NV(US) . 2005

机译：海量数据集的高效并行数据挖掘：并行随机森林分类器
5. Combinatorial Optimization on Massive Datasets: Streaming, Distributed, and Massively Parallel Computation [D] . Assadi, Sepehr. 2018

机译：大规模数据集的组合优化：流式，分布式和大规模并行计算
6. A new computational method to predict transcriptional activity of a DNA sequence from diverse datasets of massively parallel reporter assays [O] . Ying Liu, Takuma Irie, Tetsushi Yada, 2017

机译：一种新的计算方法可从大量平行报道基因分析的不同数据集中预测DNA序列的转录活性
7. A novel parallel algorithm for frequent itemsets mining in massive small files datasets [O] . Xia, D., Rong, Z., Zhou, Y., 2014

机译：大规模小文件数据集中频繁项集挖掘的新型并行算法
8. Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets [R] . Madduri, K., Ediger, D., Jiang, K., 2008

机译：更快的并行算法和高效的多线程实现，用于评估海量数据集的中介中心性

Efficient Parallel Data Mining for Massive Datasets: Parallel Random Forests Classifier

摘要

著录项

相似文献

相关主题

期刊订阅