首页> 外文期刊>Information visualization >A comparative user study of visualization techniques for cluster analysis of multidimensional data sets
【24h】

A comparative user study of visualization techniques for cluster analysis of multidimensional data sets

机译:多维数据集集群分析的可视化技术的比较用户研究

获取原文
获取原文并翻译 | 示例
           

摘要

This article presents an empirical user study that compares eight multidimensional projection techniques for supporting the estimation of the number of clusters, k , embedded in six multidimensional data sets. The selection of the techniques was based on their intended design, or use, for visually encoding data structures, that is, neighborhood relations between data points or groups of data points in a data set. Concretely, we study: the difference between the estimates of k as given by participants when using different multidimensional projections; the accuracy of user estimations with respect to the number of labels in the data sets; the perceived usability of each multidimensional projection; whether user estimates disagree with k values given by a set of cluster quality measures; and whether there is a difference between experienced and novice users in terms of estimates and perceived usability. The results show that: dendrograms (from Ward’s hierarchical clustering) are likely to lead to estimates of k that are different from those given with other multidimensional projections, while Star Coordinates and Radial Visualizations are likely to lead to similar estimates; t-Stochastic Neighbor Embedding is likely to lead to estimates which are closer to the number of labels in a data set; cluster quality measures are likely to produce estimates which are different from those given by users using Ward and t-Stochastic Neighbor Embedding; U-Matrices and reachability plots will likely have a low perceived usability; and there is no statistically significant difference between the answers of experienced and novice users. Moreover, as data dimensionality increases, cluster quality measures are likely to produce estimates which are different from those perceived by users using any of the assessed multidimensional projections. It is also apparent that the inherent complexity of a data set, as well as the capability of each visual technique to disclose such complexity, has an influence on the perceived usability.
机译:本文介绍了一个经验的用户研究,可以比较八个多维投影技术,用于支持六个多维数据集中嵌入的群集克朗克朗的估计。技术的选择基于其预期的设计,或者使用,用于视觉编码数据结构,即数据集中的数据点或数据点组之间的邻域关系。具体地,我们研究:参与者使用不同的多维预测时k的估计值之间的差异;对数据集中标签数的用户估计的准确性;每个多维投影的感知可用性;用户是否估计不同意由一组集群质量措施给出的k值;在估计和感知可用性方面,经验丰富和新手的用户是否存在差异。结果表明:树木图(来自Ward的分层聚类)可能导致k的估计与其他多维投影不同的估计,而星坐标和径向可视化可能导致类似的估计; T-TocoChifal邻居嵌入可能导致估计更接近数据集中标签数量;集群质量措施可能会产生与使用病房和T-Tocaste邻居嵌入用户提供的估计数量的估计值; U形矩阵和可达性地块可能具有低的感知可用性;经验丰富和新手用户的答案,没有统计学上的差异。此外,随着数据维度的增加,群集质量措施可能会产生与用户使用任何评估的多维投影所感知的估计。显而易见的是,数据集的固有复杂性以及每种视觉技术公开这种复杂性的能力,对感知的可用性产生影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号