Distributed clustering of categorical data using the information bottleneck framework

Tagasovska Natasa; Andritsos Periklis

首页> 外文期刊>Information Systems >Distributed clustering of categorical data using the information bottleneck framework

【24h】

Distributed clustering of categorical data using the information bottleneck framework

机译：使用信息瓶颈框架的分类数据的分布式聚类

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We perform clustering of categorical data using the Information Bottleneck, (IB), framework at large scale. We examine the performance of existing solutions using multiple machine architectures. The IB method uses information theory to recast database relations as probability distributions and the proximity of their tuples as their loss of information when they are considered together. More precisely, we study the Agglomerative Information Bottleneck, the Sequential Information Bottleneck and LIMBO, a newer approach that uses summaries of the original data. First we evaluate the performance and limitations of these algorithms when confronted with large datasets in a single, powerful machine. We then propose new implementations that take advantage of distributed environments. Using real and large synthetic datasets of tens of Gigabytes in size, we finally evaluate their effectiveness and efficiency. (C) 2017 Elsevier Ltd. All rights reserved.

机译：我们使用信息瓶颈（IB）框架大规模执行分类数据的聚类。我们使用多种机器架构检查现有解决方案的性能。 IB方法使用信息论来将数据库关系重现为概率分布，而将元组的邻近度重现为信息丢失（将它们一起考虑时）。更准确地说，我们研究了聚集信息瓶颈，顺序信息瓶颈和LIMBO，这是一种使用原始数据摘要的更新方法。首先，我们在一台功能强大的机器中面对大型数据集时，评估了这些算法的性能和局限性。然后，我们提出利用分布式环境的新实现。我们最终使用数十千兆字节的真实和大型综合数据集来评估其有效性和效率。（C）2017 Elsevier Ltd.保留所有权利。

著录项

来源
《Information Systems》 |2017年第12期|161-178|共18页
作者
Tagasovska Natasa; Andritsos Periklis;
展开▼
作者单位

Univ Lausanne, HEC, Dept Informat Syst, CH-1015 Lausanne, Switzerland;

Univ Toronto, Fac Informat, 140 St George St, Toronto, ON M5S 3G6, Canada;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Distributed clustering; Categorical data; Information Bottleneck;

机译：分布式聚类分类数据信息瓶颈;
入库时间 2022-08-18 02:47:43

相似文献

外文文献
中文文献
专利

1. Hierarchical division clustering framework for categorical data [J] . Wei Wei, Liang Jiye, Guo Xinyao, Neurocomputing . 2019,第MAYa14期

机译：分类数据的分层划分聚类框架
2. Hierarchical division clustering framework for categorical data [J] . Wei Wei, Liang Jiye, Guo Xinyao, Neurocomputing . 2019,第May14期

机译：分类分割群体的分类数据
3. A Framework for Clustering Categorical Time-Evolving Data [J] . Cao F.Liang J.Bai L.Zhao X.Dang C. Fuzzy Systems, IEEE Transactions on . 2010,第5期

机译：聚类分类时间数据的框架
4. A New Context-Based Clustering Framework for Categorical Data [C] . Thanh-Phu Nguyen, Duy-Tai Dinh, Van-Nam Huynh Pacific Rim international conference on artificial intelligence . 2018

机译：一种新的基于上下文的分类数据聚类框架
5. Automatic categorical data clustering and spatial data clustering by consecutive resolution refinement. [D] . Foss, Andrew Philip Ogilvie. 2002

机译：通过连续的分辨率优化自动分类数据聚类和空间数据聚类。
6. Evaluation of Modified Categorical Data Fuzzy Clustering Algorithm on the Wisconsin Breast Cancer Dataset [O] . Amir Ahmad 2016

机译：改进的分类数据模糊聚类算法对威斯康星州乳腺癌数据集的评估
7. A framework of fuzzy partition based on Artificial Bee Colony for categorical data clustering [O] . Iwan Tri Riyadi Yanto, Younes Saadi, Dedy Hartama, 2016

机译：基于人工蜜蜂群体进行分类数据聚类的模糊分区框架

Distributed clustering of categorical data using the information bottleneck framework

摘要

著录项

相似文献

相关主题

期刊订阅