iVIBRATE: Interactive Visualization-Based Framework for Clustering Large Datasets

KEKE CHEN; LING LIU

首页> 外文期刊>ACM Transactions on Information Systems >iVIBRATE: Interactive Visualization-Based Framework for Clustering Large Datasets

【24h】

iVIBRATE: Interactive Visualization-Based Framework for Clustering Large Datasets

机译：iVIBRATE：基于可视化的交互式大数据集框架

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

With continued advances in communication network technology and sensing technology, there is astounding growth in the amount of data produced and made available through cyberspace. Efficient and high-quality clustering of large datasets continues to be one of the most important problems in large-scale data analysis. A commonly used methodology for cluster analysis on large datasets is the three-phase framework of sampling/summarization, iterative cluster analysis, and disk-labeling. There are three known problems with this framework which demand effective solutions. The first problem is how to effectively define and validate irregularly shaped clusters, especially in large datasets. Automated algorithms and statistical methods are typically not effective in handling these particular clusters. The second problem is how to effectively label the entire data on disk (disk-labeling) without introducing additional errors, including the solutions for dealing with outliers, irregular clusters, and cluster boundary extension. The third obstacle is the lack of research about issues related to effectively integrating the three phases. In this article, we describe iVIBRATE—an interactive visualization-based three-phase framework for clustering large datasets. The two main components of iVIBRATE are its VISTA visual cluster-rendering subsystem which invites human interplay into the large-scale iterative clustering process through interactive visualization, and its adaptive ClusterMap labeling subsystem which offers visualization-guided disk-labeling solutions that are effective in dealing with outliers, irregular clusters, and cluster boundary extension. Another important contribution of iVIBRATE development is the identification of the special issues presented in integrating the two components and the sampling approach into a coherent framework, as well as the solutions for improving the reliability of the framework and for minimizing the amount of errors generated within the cluster analysis process. We study the effectiveness of the iVIBRATE framework through a walkthrough example dataset of a million records and we experimentally evaluate the iVIBRATE approach using both real-life and synthetic datasets. Our results show that iVIBRATE can efficiently involve the user in the clustering process and generate high-quality clustering results for large datasets.

机译：随着通信网络技术和传感技术的不断进步，通过网络空间产生和提供的数据量有了惊人的增长。大型数据集的高效高质量聚类仍然是大规模数据分析中最重要的问题之一。大型数据集的聚类分析的一种常用方法是采样/汇总，迭代聚类分析和磁盘标记的三个阶段框架。该框架存在三个已知问题，需要有效的解决方案。第一个问题是如何有效定义和验证形状不规则的聚类，尤其是在大型数据集中。自动化的算法和统计方法通常在处理这些特定群集方面无效。第二个问题是如何在不引入其他错误的情况下有效地标记磁盘上的整个数据（磁盘标记），包括处理异常值，不规则簇和簇边界扩展的解决方案。第三个障碍是缺乏与有效整合三个阶段有关的问题的研究。在本文中，我们描述了iVIBRATE-一种基于交互式可视化的三相框架，用于对大型数据集进行聚类。 iVIBRATE的两个主要组件是其VISTA视觉集群渲染子系统，该子系统通过交互式可视化邀请人类相互参与到大规模的迭代集群过程中；其自适应ClusterMap标记子系统提供了可视化引导的磁盘标记解决方案，可有效地处理具有离群值，不规则聚类和聚类边界扩展。 iVIBRATE开发的另一个重要贡献是确定了将两个组件和采样方法集成到一个一致的框架中时出现的特殊问题，以及提高框架可靠性和最大程度地减少框架内产生的错误数量的解决方案。聚类分析过程。我们通过一百万条记录的演练示例数据集研究了iVIBRATE框架的有效性，并使用真实数据集和综合数据集对iVIBRATE方法进行了实验评估。我们的结果表明，iVIBRATE可以有效地使用户参与聚类过程，并为大型数据集生成高质量的聚类结果。

著录项

来源
《ACM Transactions on Information Systems》 |2006年第2期|p.245-294|共50页
作者
KEKE CHEN; LING LIU;
展开▼
作者单位

College of Computing, Georgia Institute of Technology, 801 Atlantic Drive, Atlanta, GA 30332;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
algorithms; design; human factors; reliability;

机译：算法;设计;人为因素;可靠性;
入库时间 2022-08-18 00:46:01

相似文献

外文文献
中文文献
专利

1. FUn: a framework for interactive visualizations of large, high-dimensional datasets on the web [J] . Probst Daniel, Reymond Jean-Louis Bioinformatics . 2018,第8期

机译：有趣：Web上的大型高维数据集的交互式可视化框架
2. Gpu-based Interactive Visualization Framework For Ultrasound Datasets [J] . Sukhyun Lim, Koojoo Kwon, Byeong-Seok Shin Computer Animation and Virtual Worlds . 2009,第1期

机译：基于Gpu的超声数据集交互式可视化框架
3. A single-pass GPU ray casting framework for interactive out-of-core rendering of massive volumetric datasets [J] . Enrico Gobbetti, Fabio Marton, Jose Antonio Iglesias Guitian The Visual Computer . 2008,第7a9期

机译：用于大量体积数据集的交互式核外渲染的单通道GPU射线投射框架
4. A Modified Relationship Based Clustering Framework for Density Based Clustering and Outlier Filtering on High Dimensional Datasets [C] . Turgay Tugay Bilgin, A. Yilmaz Camurcu Advances in Knowledge Discovery and Data Mining; Lecture Notes in Artificial Intelligence; 4426 . 2007

机译：用于高密度数据集上基于密度的聚类和离群值过滤的基于关系的聚类改进框架
5. Supervised precision ordinal clustering – A human-machine learning algorithm to create accurate clusters in big datasets: Application to indiana water quality data with novel visualization techniques [D] . Singh, Sarabjit 2014

机译：有监督的有序序数聚类–一种人机学习算法，可在大型数据集中创建准确的聚类：采用新颖的可视化技术应用于印第安纳州水质数据
6. Correction: clusterExperiment and RSEC: A Bioconductor package and framework for clustering of single-cell and other large gene expression datasets [O] . 2019

机译：更正：clusterExperiment和RSEC：用于单细胞和其他大型基因表达数据集聚类的Bioconductor软件包和框架
7. iVIBRATE: Interactive visualization based framework for clustering large datasets [O] . Keke Chen, Ling Liu 2006

机译：iVIBRATE：基于交互式可视化的框架，用于对大型数据集进行聚类

iVIBRATE: Interactive Visualization-Based Framework for Clustering Large Datasets

摘要

著录项

相似文献

相关主题

期刊订阅