...
首页> 外文期刊>SIGKDD explorations >Dependency Clustering Across Measurement Scales
【24h】

Dependency Clustering Across Measurement Scales

机译:跨测量尺度的依存性聚类

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

How to automatically spot the major trends in large amounts of heterogeneous data? Clustering can help. However, most existing techniques suffer from one or more of the following drawbacks: 1) Many techniques support only one particular data type, most commonly numerical attributes. 2) Other techniques do not support attribute dependencies which are prevalent in real data. 3) Some approaches require input parameters which are difficult to estimate. 4) Most clustering approaches lack in interpretability. To address these challenges, we present the algorithm Scenic for dependency clustering across measurement scales. Our approach seamlessly integrates het-erogenous data types measured at different scales, most importantly continuous numerical and discrete categorical data. Scenic clusters by arranging objects and attributes in a cluster-specific low-dimensional space. The embedding serves as a compact cluster model allowing to reconstruct the original het-erogenous attributes with high accuracy. Thereby embedding reveals the major cluster-specific mixed-type attribute dependencies. Following the Minimum Description Length (MDL) principle, the cluster-specific embedding serves as a codebook for effective data compression. This compression-based view automatically balances goodness-of-fit and model complexity, making input parameters redundant. Finally, the embedding serves as a visualization enhancing the interpretability of the clustering result. Extensive experiments demonstrate the benefits of Scenic.
机译:如何自动发现大量异构数据中的主要趋势?群集可以提供帮助。但是,大多数现有技术具有以下一个或多个缺点:1)许多技术仅支持一种特定的数据类型,最常见的是数字属性。 2)其他技术不支持在实际数据中普遍存在的属性依赖性。 3)一些方法需要难以估计的输入参数。 4)大多数聚类方法缺乏可解释性。为了解决这些挑战,我们提出了Scenic算法,用于跨度量标准的依存关系聚类。我们的方法无缝地集成了在不同规模下测量的异构数据类型,最重要的是连续的数值和离散的分类数据。通过在特定于群集的低维空间中排列对象和属性来构成景观群集。嵌入用作紧凑的聚类模型,可以高精度地重建原始的异质属性。因此,嵌入揭示了主要的特定于群集的混合类型属性依赖性。遵循最小描述长度(MDL)原则,特定于群集的嵌入充当有效数据压缩的密码本。这种基于压缩的视图可自动平衡拟合优度和模型复杂性,从而使输入参数变得多余。最后,嵌入用作增强聚类结果可解释性的可视化。大量的实验证明了Scenic的好处。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号