【24h】

Clustering Structured Data with the SPARE Library

机译:使用SPARE库对结构化数据进行聚类

获取原文

摘要

In this paper we propose the SPARE C++ Library as a flexible tool for solving data driven modelling problems, where the domain space is not necessary constrained to be the set of real valued vectors. The. possibility to face Pattern Recognition problems directly on structured domains (multimedia data, strings, graphs) is fundamental to the effective solution of many interesting applications in the field of content based retrieval and Knowledge Discovery. As an instance of this particular characteristic of SPARE, we considered a clustering problem defined in the string domain, focusing on the problem of cluster representation in data domains where only a dissimilarity measure can be fixed. To this aim we propose to adopt the MinSOD (Minimum Sum of Distances) defined as the element of a cluster minimizing the sum of dissimilarities from all. the other elements in the considered set. Since the precise computation of the MinSOD have a high computational cost, we propose a suboptimal procedure consisting in computing the representative of the cluster considering only a reduced pool of samples, instead of the whole set of objects in the cluster. We have carried out some tests in order to ascertain the sensitivity of the clustering procedure with respect to the number of samples in the pool used to compute the MinSOD. Results show a good robustness of the proposed procedure. The implementation is available as part of the SPARE library, published as an open source project.
机译:在本文中,我们提出了SPARE C ++库作为解决数据驱动的建模问题的灵活工具,其中域空间不必限制为实值向量的集合。这。在结构化域(多媒体数据,字符串,图形)上直接面对模式识别问题的可能性,对于基于内容的检索和知识发现领域中许多有趣应用的有效解决方案至关重要。作为SPARE特定特征的一个实例,我们考虑了在字符串域中定义的聚类问题,重点关注的是数据域中的聚簇表示问题,在该域中只能解决相异性问题。为此,我们建议采用MinSOD(最小距离总和),MinSOD被定义为群集的元素,以最大程度地减少所有差异。考虑的集合中的其他元素。由于MinSOD的精确计算具有较高的计算成本,因此我们提出了一种次优过程,该过程包括仅考虑减少的样本池而不是考虑群集中的整个对象集来计算群集的代表。为了确定聚类过程相对于用于计算MinSOD的池中样本数量的敏感性,我们进行了一些测试。结果显示了所提出程序的良好鲁棒性。该实现可作为SPARE库的一部分获得,该SPARE库以开源项目的形式发布。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号