首页> 外文会议>Artificial neural networks in pattern recognition >Clustering Very Large Dissimilarity Data Sets
【24h】

Clustering Very Large Dissimilarity Data Sets

机译:聚类非常大的差异数据集

获取原文
获取原文并翻译 | 示例

摘要

Clustering and visualization constitute key issues in computer-supported data inspection, and a variety of promising tools exist for such tasks such as the self-organizing map (SOM) and variations thereof. Real life data, however, pose severe problems to standard data inspection: on the one hand, data are often represented by complex non-vectorial objects and standard methods for finite dimensional vectors in Euclidean space cannot be applied. On the other hand, very large data sets have to be dealt with, such that data do neither fit into main memory, nor more than one pass over the data is still affordable, i.e. standard methods can simply not be applied due to the sheer amount of data. We present two recent extensions of topographic mappings: relational clustering, which can deal with general proximity data given by pairwise distances, and patch processing, which can process streaming data of arbitrary size in patches. Together, an efficient linear time data inspection method for general dissimilarity data structures results. We present the theoretical background as well as applications to the areas of text and multimedia processing based on the generalized compression distance.
机译:聚类和可视化是计算机支持的数据检查中的关键问题,并且存在许多有前途的工具可用于此类任务,例如自组织图(SOM)及其变体。但是,现实生活中的数据给标准数据检查带来了严重的问题:一方面,数据通常由复杂的非矢量对象表示,并且无法应用欧几里得空间中有限维矢量的标准方法。另一方面,必须处理非常大的数据集,以使数据既无法放入主存储器中,也无法承受一次以上的数据传递,即由于数量庞大,无法采用标准方法数据的。我们介绍了地形映射的两个最新扩展:关系聚类,可以处理成对距离给出的一般邻近数据;补丁处理,可以处理补丁中任意大小的流数据。综合起来,得出了一种用于一般差异数据结构的有效线性时间数据检查方法。我们介绍了理论背景以及基于广义压缩距离的文本和多媒体处理领域的应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号