首页> 外文学位 >Contributions to parallel and distributed computing in knowledge discovery and data mining.
【24h】

Contributions to parallel and distributed computing in knowledge discovery and data mining.

机译:在知识发现和数据挖掘中对并行和分布式计算的贡献。

获取原文
获取原文并翻译 | 示例

摘要

Recently databases are increasing continuously without bound, due to new data acquisition technologies. One challenge is how to gain knowledge from these large data sets. In this thesis, we analyze and improve the algorithmic solution of four problems related to knowledge discovery and data mining, making use of parallel computing; we also compare our results with related works. We design two parallel algorithms for outlier detection; the first one is for finding distance-based outliers based on nested loops along with randomization and the use of a pruning rule. The second parallel algorithm is for detecting density-based local outliers. In both cases data parallelism is used. The star coordinates plot is a useful visualization technique, but it has some drawbacks. We enhance the traditional star coordinates plot introducing new parameters that will allow us to visualize the data points in two dimensions as polygons and in three dimensions as polyhedrons. In order to visualize large data sets and reduce its computational time, a parallel algorithm is also designed. We design a new meta-classifier algorithm, and its performance is compared with base classifier algorithms and Bagged based meta-classifier algorithms. Our meta-classifier algorithm gives better results compared to other meta-classifier algorithms. For speeding up its computation time as well as making it suitable for large data sets a parallel algorithm is developed. We develop a meta-clustering algorithm and compare its performance with two Bagged based meta-clustering algorithms, and hypergraph partitioning meta-clustering algorithm. Our proposed meta-clustering algorithm gives results close to the best clustering algorithm, and is more robust to the data dependency problem. A parallel algorithm to compute four meta-clustering algorithm is also designed.;The experimental results of our collection of sequential and parallel programs is tested in two different clusters of Linux-based workstations using real-world databases available in the Machine Learning Repository of the University of California at Irvine.
机译:最近,由于新的数据采集技术,数据库正在无限制地持续增长。挑战之一是如何从这些大数据集中获取知识。本文利用并行计算技术,对与知识发现和数据挖掘相关的四个问题的算法解决方案进行了分析和改进。我们还将我们的结果与相关作品进行比较。我们设计了两种并行算法进行离群值检测;第一个是基于嵌套循环以及随机化和修剪规则的使用来找到基于距离的离群值。第二种并行算法用于检测基于密度的局部离群值。在这两种情况下,都使用数据并行性。星坐标图是一种有用的可视化技术,但它有一些缺点。我们增强了传统的星形坐标图,引入了新的参数,这些参数将使我们可以将数据点可视化为二维的多边形,将三维可视化为多面体。为了可视化大型数据集并减少其计算时间,还设计了一种并行算法。我们设计了一种新的元分类器算法,并将其性能与基本分类器算法和基于Bagged的元分类器算法进行了比较。与其他元分类器算法相比,我们的元分类器算法可提供更好的结果。为了加快其计算时间并使其适合大数据集,开发了一种并行算法。我们开发了一种元聚类算法,并将其性能与两种基于Bagged的元聚类算法和超图分区元聚类算法进行了比较。我们提出的元聚类算法给出的结果接近于最佳聚类算法,并且对数据依赖问题更健壮。还设计了一种并行算法来计算四种元聚类算法。;我们的序列和并行程序集合的实验结果在两个不同的基于Linux的工作站集群中进行了测试,使用的是机器学习存储库中的真实数据库加州大学尔湾分校。

著录项

  • 作者

    Lozano Inca, Elio.;

  • 作者单位

    University of Puerto Rico, Mayaguez (Puerto Rico).;

  • 授予单位 University of Puerto Rico, Mayaguez (Puerto Rico).;
  • 学科 Statistics.;Computer Science.
  • 学位 Ph.D.
  • 年度 2006
  • 页码 131 p.
  • 总页数 131
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 统计学;自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号