首页> 外文学位 >Visual data mining: Using parallel coordinate plots with K-means clustering and color to find correlations in a multidimensional dataset.
【24h】

Visual data mining: Using parallel coordinate plots with K-means clustering and color to find correlations in a multidimensional dataset.

机译:可视数据挖掘:使用具有K均值聚类和颜色的平行坐标图来查找多维数据集中的相关性。

获取原文
获取原文并翻译 | 示例

摘要

This thesis examines the use of parallel coordinate (PC) plots for visual data mining. It concentrates on graphs using PC plots with multidimensional data sets. The concept of the "polyline" and parallel axis are defined. These are the basic building blocks for graphing a parallel coordinate plot. Visualization problems with parallel coordinate plots typically involve ambiguity and clutter. These two issues are addressed by using the technique of "clustering and color". The use of color in a parallel coordinate plot reduces the problem of ambiguity. Separating the data set into natural groups, or clusters, reduces clutter. A methodology is outlined that describes how to cluster and color a multidimensional data set. The K-means clustering algorithm will be introduced. Application of K-means to produce clusters of polylines in a PC plot is shown. The 'K' from K-means is defined as the number of clusters. The value for K is user defined. In the spirit of graphical visualization, to select the "best" number for K, the "distortion plot" is introduced. Once the methodology of graphing a meaningful parallel coordinate plot is outlined, it is illustrated with an analysis of a real multidimensional data set. The thesis finishes with a summary of the effectiveness and applications of visual data mining using a series of PC plots with clustering and color.
机译:本文研究了使用平行坐标(PC)绘图进行可视数据挖掘。它着重于使用带有多维数据集的PC图的图形。定义了“折线”和平行轴的概念。这些是绘制平行坐标图的基本构建块。平行坐标图的可视化问题通常涉及歧义和混乱。通过使用“聚类和彩色”技术可以解决这两个问题。在平行坐标图中使用颜色可以减少歧义的问题。将数据集分成自然的组或集群,可以减少混乱。概述了一种方法,描述了如何对多维数据集进行聚类和着色。将介绍K-均值聚类算法。显示了在PC图中应用K均值以产生折线簇的情况。来自K均值的“ K”定义为聚类数。 K的值是用户定义的。本着图形可视化的精神,为选择K的“最佳”数字,引入了“失真图”。概述了绘制有意义的平行坐标图的方法后,将通过对真实多维数据集的分析进行说明。本文最后总结了使用一系列具有聚类和颜色的PC图进行可视数据挖掘的有效性和应用。

著录项

  • 作者

    Peterson, Angela R.;

  • 作者单位

    Kutztown University of Pennsylvania.;

  • 授予单位 Kutztown University of Pennsylvania.;
  • 学科 Statistics.;Computer Science.
  • 学位 M.S.
  • 年度 2009
  • 页码 59 p.
  • 总页数 59
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 统计学;自动化技术、计算机技术;
  • 关键词

  • 入库时间 2022-08-17 11:38:29

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号