首页> 外文学位 >Contributions to parallel and distributed computing in knowledge discovery and data mining.

【24h】

Contributions to parallel and distributed computing in knowledge discovery and data mining.

机译：在知识发现和数据挖掘中对并行和分布式计算的贡献。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recently databases are increasing continuously without bound, due to new data acquisition technologies. One challenge is how to gain knowledge from these large data sets. In this thesis, we analyze and improve the algorithmic solution of four problems related to knowledge discovery and data mining, making use of parallel computing; we also compare our results with related works. We design two parallel algorithms for outlier detection; the first one is for finding distance-based outliers based on nested loops along with randomization and the use of a pruning rule. The second parallel algorithm is for detecting density-based local outliers. In both cases data parallelism is used. The star coordinates plot is a useful visualization technique, but it has some drawbacks. We enhance the traditional star coordinates plot introducing new parameters that will allow us to visualize the data points in two dimensions as polygons and in three dimensions as polyhedrons. In order to visualize large data sets and reduce its computational time, a parallel algorithm is also designed. We design a new meta-classifier algorithm, and its performance is compared with base classifier algorithms and Bagged based meta-classifier algorithms. Our meta-classifier algorithm gives better results compared to other meta-classifier algorithms. For speeding up its computation time as well as making it suitable for large data sets a parallel algorithm is developed. We develop a meta-clustering algorithm and compare its performance with two Bagged based meta-clustering algorithms, and hypergraph partitioning meta-clustering algorithm. Our proposed meta-clustering algorithm gives results close to the best clustering algorithm, and is more robust to the data dependency problem. A parallel algorithm to compute four meta-clustering algorithm is also designed.;The experimental results of our collection of sequential and parallel programs is tested in two different clusters of Linux-based workstations using real-world databases available in the Machine Learning Repository of the University of California at Irvine.

机译：最近，由于新的数据采集技术，数据库正在无限制地持续增长。挑战之一是如何从这些大数据集中获取知识。本文利用并行计算技术，对与知识发现和数据挖掘相关的四个问题的算法解决方案进行了分析和改进。我们还将我们的结果与相关作品进行比较。我们设计了两种并行算法进行离群值检测；第一个是基于嵌套循环以及随机化和修剪规则的使用来找到基于距离的离群值。第二种并行算法用于检测基于密度的局部离群值。在这两种情况下，都使用数据并行性。星坐标图是一种有用的可视化技术，但它有一些缺点。我们增强了传统的星形坐标图，引入了新的参数，这些参数将使我们可以将数据点可视化为二维的多边形，将三维可视化为多面体。为了可视化大型数据集并减少其计算时间，还设计了一种并行算法。我们设计了一种新的元分类器算法，并将其性能与基本分类器算法和基于Bagged的元分类器算法进行了比较。与其他元分类器算法相比，我们的元分类器算法可提供更好的结果。为了加快其计算时间并使其适合大数据集，开发了一种并行算法。我们开发了一种元聚类算法，并将其性能与两种基于Bagged的元聚类算法和超图分区元聚类算法进行了比较。我们提出的元聚类算法给出的结果接近于最佳聚类算法，并且对数据依赖问题更健壮。还设计了一种并行算法来计算四种元聚类算法。;我们的序列和并行程序集合的实验结果在两个不同的基于Linux的工作站集群中进行了测试，使用的是机器学习存储库中的真实数据库加州大学尔湾分校。

著录项

作者
Lozano Inca, Elio.;
展开▼
作者单位

University of Puerto Rico, Mayaguez (Puerto Rico).;

展开▼
授予单位 University of Puerto Rico, Mayaguez (Puerto Rico).;
学科 Statistics.;Computer Science.
学位 Ph.D.
年度 2006
页码 131 p.
总页数 131
原文格式 PDF
正文语种 eng
中图分类统计学;自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. A special issue of Journal of Parallel and Distributed Computing: Models and algorithms for high-performance distributed data mining [J] . Alfredo Cuzzocrea Journal of Parallel and Distributed Computing . 2011,第5期

机译：《并行与分布式计算杂志》特刊：高性能分布式数据挖掘的模型和算法
2. Special Issue of the Journal of Parallel and Distributed Computing: Data-Intensive Computing [J] . Surendra Byna, Xian-He Sun Journal of Parallel and Distributed Computing . 2009,第11期

机译：并行与分布式计算杂志特刊：数据密集型计算
3. Parallelizing K-Anonymity Algorithm for Privacy Preserving Knowledge Discovery from Big Data [J] . Y. Sowmya, M. Nagaratna International Journal of Applied Engineering Research . 2016,第2aPta6期

机译：并行K-匿名算法用于保护大数据中的知识发现
4. From data collection to knowledge data discovery: a medical application of data mining. [C] . Duhamel A, Picavet M, Devos P, MEDINFO . 2001

机译：从数据收集到知识数据发现：数据挖掘的医学应用。
5. A Cloud Computing Based Platform for Geographically Distributed Health Data Mining. [D] . Guo, Yunyong. 2013

机译：基于云计算的地理分布式健康数据挖掘平台。
6. HTSFinder: Powerful Pipeline of DNA Signature Discovery by Parallel and Distributed Computing [O] . Ramin Karimi, Andras Hajdu 2016

机译：HTSFinder：通过并行和分布式计算发现DNA签名的强大管道
7. Contributions to Desktop Grid Computing : From High Throughput Computing to Data-Intensive Sciences on Hybrid Distributed Computing Infrastructures [O] . Fedak Gilles 2015

机译：对桌面网格计算的贡献：从高吞吐量计算到混合分布式计算基础架构上的数据密集型科学
8. Intensive Knowledge Discovery from Heterogeneous Distributed Data and Knowledge [R] . Capraro, G. T. , Berdan, G. B. 2001

机译：异构分布式数据和知识的密集型知识发现

Contributions to parallel and distributed computing in knowledge discovery and data mining.

摘要

著录项

相似文献

相关主题

期刊订阅