LIMBO: Scalable Clustering of Categorical Data

机译：LIMBO：分类数据的可扩展群集

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering is a problem of great practical importance in numerous applications. The problem of clustering becomes more challenging when the data is categorical, that is, when there is no inherent distance measure between data values. We introduce LIMBO, a scalable hierarchical categorical clustering algorithm that builds on the Information Bottleneck (IB) framework for quantifying the relevant information preserved when clustering. As a hierarchical algorithm, LIMBO has the advantage that it can produce clusterings of different sizes in a single execution. We use the IB framework to define a distance measure for categorical tuples and we also present a novel distance measure for categorical attribute values. We show how the LIMBO algorithm can be used to cluster both tuples and values. LIMBO handles large data sets by producing a memory bounded summary model for the data. We present an experimental evaluation of LIMBO, and we study how clustering quality compares to other categorical clustering algorithms. LIMBO supports a trade-off between efficiency (in terms of space and time) and quality. We quantify this trade-off and demonstrate that LIMBO allows for substantial improvements in efficiency with negligible decrease in quality.

机译：群集是在许多应用程序中非常重要的问题。当数据是分类的时，即在数据值之间没有固有的距离度量时，聚类的问题变得更具挑战性。我们介绍LIMBO，这是一种可扩展的层次分类聚类算法，它建立在信息瓶颈（IB）框架上，用于量化聚类时保留的相关信息。作为分层算法，LIMBO的优点是可以在一次执行中产生不同大小的聚类。我们使用IB框架为分类元组定义距离度量，并且我们还为分类属性值提供了一种新颖的距离度量。我们展示了如何使用LIMBO算法对元组和值进行聚类。 LIMBO通过为数据生成内存限制的摘要模型来处理大型数据集。我们提出了LIMBO的实验评估，并且研究了聚类质量与其他分类聚类算法的比较。 LIMBO支持在效率（在空间和时间方面）和质量之间进行权衡。我们量化了这种权衡，并证明LIMBO可以显着提高效率，而质量下降可忽略不计。

著录项

来源
《International Conference on Extending Database Technology(EDBT 2004); 20040314-20040318; Heraklion; GR》|2004年|P.123-146|共24页
会议地点 Crete(GR);Crete(GR)
作者
Periklis Andritsos; Panayiotis Tsaparas; Renee J. Miller; Kenneth C. Sevcik;
展开▼
作者单位

University of Toronto, Department of Computer Science;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类各种专用数据库;
关键词
入库时间 2022-08-26 14:11:46

相似文献

外文文献
中文文献
专利

1. A SCALABLE CLUSTERING METHOD FOR CATEGORICAL SEQUENCE DATA [J] . Seung-Joon Oh, Jae-Yearn Kim International journal of computational methods . 2005,第2期

机译：分类序列数据的可伸缩聚类方法
2. Clustering categorical and numerical data: a new procedure using multidimensional scaling [J] . Sung-Gi Lee, Deok-Kyun Yun International Journal of Information Technology & Decision Making . 2003,第1期

机译：分类和数值数据的聚类：使用多维标度的新过程
3. CLUSTER ANALYSIS OF ENVIRONMENTAL DATA WHICH IS NOT INTERVAL SCALED BUT CATEGORICAL: EVALUATION OF AERIAL PHOTOGRAPHS OF GROYNEFIELDS FOR THE DETERMINATION OF REPRESENTATIVE SAMPLING SITES [J] . Hannappel S., Piepho B. Chemosphere . 1996,第2期

机译：环境数据的聚类分析，但不是间歇性的，而是分类的：为确定代表性采样点而对粗粒塑料的航空照相进行评估
4. LIMBO: Scalable Clustering of Categorical Data [C] . Periklis Andritsos, Panayiotis Tsaparas, Renee J. Miller, International Conference on Extending Database Technology . 2004

机译：Limbo：可扩展的分类数据集群
5. Automatic categorical data clustering and spatial data clustering by consecutive resolution refinement. [D] . Foss, Andrew Philip Ogilvie. 2002

机译：通过连续的分辨率优化自动分类数据聚类和空间数据聚类。
6. Evaluation of Modified Categorical Data Fuzzy Clustering Algorithm on the Wisconsin Breast Cancer Dataset [O] . Amir Ahmad 2016

机译：改进的分类数据模糊聚类算法对威斯康星州乳腺癌数据集的评估
7. LIMBO: Scalable clustering of categorical data [O] . Periklis Andritsos, Panayiotis Tsaparas, Renée J. Miller, 2004

机译：LImBO：分类数据的可扩展聚类

LIMBO: Scalable Clustering of Categorical Data

摘要

著录项

相似文献

相关主题

期刊订阅