首页> 外文OA文献 >Hierarchical clustering of massive, high dimensional data sets by exploiting ultrametric embedding
【2h】

Hierarchical clustering of massive, high dimensional data sets by exploiting ultrametric embedding

机译:通过利用超度量嵌入对海量高维数据集进行分层聚类

摘要

Coding of data, usually upstream of data analysis, has crucial impli- cations for the data analysis results. By modifying the data coding – through use of less than full precision in data values – we can aid appre- ciably the effectiveness and efficiency of the hierarchical clustering. In our first application, this is used to lessen the quantity of data to be hierar- chically clustered. The approach is a hybrid one, based on hashing and on the Ward minimum variance agglomerative criterion. In our second appli- cation, we derive a hierarchical clustering from relationships between sets of observations, rather than the traditional use of relationships between the observations themselves. This second application uses embedding in a Baire space, or longest common prefix ultrametric space. We compare this second approach, which is of O(n log n) complexity, to k-means.
机译:数据编码通常是数据分析的上游,对数据分析结果具有至关重要的意义。通过修改数据编码(通过使用不完全精确的数据值),我们可以显着帮助分层聚类的有效性和效率。在我们的第一个应用程序中,这用于减少要进行层次集群的数据量。该方法是一种混合方法,基于哈希和Ward最小方差聚集标准。在第二个应用程序中,我们从一组观察值之间的关系中得出了层次聚类,而不是从观察值自身之间的关系的传统用法中得出。第二个应用程序使用Baire空间或最长公共前缀超度量空间中的嵌入。我们将第二种方法的复杂度为O(n log n)与k均值进行比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号