首页> 外文期刊>ACM Transactions on Information Systems >iVIBRATE: Interactive Visualization-Based Framework for Clustering Large Datasets
【24h】

iVIBRATE: Interactive Visualization-Based Framework for Clustering Large Datasets

机译:iVIBRATE:基于可视化的交互式大数据集框架

获取原文
获取原文并翻译 | 示例
       

摘要

With continued advances in communication network technology and sensing technology, there is astounding growth in the amount of data produced and made available through cyberspace. Efficient and high-quality clustering of large datasets continues to be one of the most important problems in large-scale data analysis. A commonly used methodology for cluster analysis on large datasets is the three-phase framework of sampling/summarization, iterative cluster analysis, and disk-labeling. There are three known problems with this framework which demand effective solutions. The first problem is how to effectively define and validate irregularly shaped clusters, especially in large datasets. Automated algorithms and statistical methods are typically not effective in handling these particular clusters. The second problem is how to effectively label the entire data on disk (disk-labeling) without introducing additional errors, including the solutions for dealing with outliers, irregular clusters, and cluster boundary extension. The third obstacle is the lack of research about issues related to effectively integrating the three phases. In this article, we describe iVIBRATE—an interactive visualization-based three-phase framework for clustering large datasets. The two main components of iVIBRATE are its VISTA visual cluster-rendering subsystem which invites human interplay into the large-scale iterative clustering process through interactive visualization, and its adaptive ClusterMap labeling subsystem which offers visualization-guided disk-labeling solutions that are effective in dealing with outliers, irregular clusters, and cluster boundary extension. Another important contribution of iVIBRATE development is the identification of the special issues presented in integrating the two components and the sampling approach into a coherent framework, as well as the solutions for improving the reliability of the framework and for minimizing the amount of errors generated within the cluster analysis process. We study the effectiveness of the iVIBRATE framework through a walkthrough example dataset of a million records and we experimentally evaluate the iVIBRATE approach using both real-life and synthetic datasets. Our results show that iVIBRATE can efficiently involve the user in the clustering process and generate high-quality clustering results for large datasets.
机译:随着通信网络技术和传感技术的不断进步,通过网络空间产生和提供的数据量有了惊人的增长。大型数据集的高效高质量聚类仍然是大规模数据分析中最重要的问题之一。大型数据集的聚类分析的一种常用方法是采样/汇总,迭代聚类分析和磁盘标记的三个阶段框架。该框架存在三个已知问题,需要有效的解决方案。第一个问题是如何有效定义和验证形状不规则的聚类,尤其是在大型数据集中。自动化的算法和统计方法通常在处理这些特定群集方面无效。第二个问题是如何在不引入其他错误的情况下有效地标记磁盘上的整个数据(磁盘标记),包括处理异常值,不规则簇和簇边界扩展的解决方案。第三个障碍是缺乏与有效整合三个阶段有关的问题的研究。在本文中,我们描述了iVIBRATE-一种基于交互式可视化的三相框架,用于对大型数据集进行聚类。 iVIBRATE的两个主要组件是其VISTA视觉集群渲染子系统,该子系统通过交互式可视化邀请人类相互参与到大规模的迭代集群过程中;其自适应ClusterMap标记子系统提供了可视化引导的磁盘标记解决方案,可有效地处理具有离群值,不规则聚类和聚类边界扩展。 iVIBRATE开发的另一个重要贡献是确定了将两个组件和采样方法集成到一个一致的框架中时出现的特殊问题,以及提高框架可靠性和最大程度地减少框架内产生的错误数量的解决方案。聚类分析过程。我们通过一百万条记录的演练示例数据集研究了iVIBRATE框架的有效性,并使用真实数据集和综合数据集对iVIBRATE方法进行了实验评估。我们的结果表明,iVIBRATE可以有效地使用户参与聚类过程,并为大型数据集生成高质量的聚类结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号