...
首页> 外文期刊>Engineering Applications of Artificial Intelligence >Sparse Self-Represented Network Map: A fast representative-based clustering method for large dataset and data stream
【24h】

Sparse Self-Represented Network Map: A fast representative-based clustering method for large dataset and data stream

机译:稀疏的自代表网络地图:大型数据集和数据流的基于代表的快速聚类方法

获取原文
获取原文并翻译 | 示例

摘要

The demand of fast clustering increases rapidly as we keep collecting tremendously large amount of data in the last decade. In this paper, we propose a nonparametric and representative-based Sparse Self-Represented Network Map for fast clustering on large dataset. Each node in the network generates a heat map for the dataset by receiving stimulations from data within its Accepting Field. We developed a weight adjusting method to learn and summarize the clustering pattern of the data. Such learned map is used for computing clustering results, by breaking weak links and finding connected components Rather than employing an iterative process to find local minima, our network passes the dataset only once and is able to capture the global pattern of the dataset as well as detecting natural number of clusters. As a nonparametric method, we propose Sparse Dynamic Instantiation to avoid the curse of dimensionality, namely a node or a link is instantiated only when stimulated by input data. As a result, the overall complexity is linear to the data dimension. Our algorithm is tested on synthetic and real datasets and compare with popular clustering algorithms (Κ-means++, Expectation-Maximization, Mean-Shift and StreamKM++) as well as state-of-art clustering algorithm (Affinity Propagation and Density Peak). We also applied our clustering algorithm to mobile location clustering, building a Visual Dictionary for image recognition, and clustering data streams. Our experiments indicate that our algorithm can be a better alternative for all compared popular clustering algorithms especially when efficiency is the primary consideration, namely we drastically improve time and space complexity but retain equal level of accuracy.
机译:在过去十年中,随着我们不断收集大量数据,对快速群集的需求迅速增长。在本文中,我们提出了一种基于非参数和代表性的稀疏自表示网络地图,用于在大型数据集上进行快速聚类。网络中的每个节点通过接收来自其接受域内数据的刺激,为数据集生成热图。我们开发了一种权重调整方法来学习和总结数据的聚类模式。这种学习的地图用于通过断开薄弱的链接并找到连接的组件来计算聚类结果,而不是使用迭代过程来查找局部最小值,我们的网络仅传递一次数据集,并且能够捕获数据集的全局模式以及检测自然簇数。作为一种非参数方法,我们提出了稀疏动态实例化以避免维数的诅咒,即节点或链接仅在输入数据激发时才实例化。结果,总体复杂度与数据维度呈线性关系。我们的算法在合成数据集和真实数据集上进行了测试,并与流行的聚类算法(Κ-means++,期望最大化,均值漂移和StreamKM ++)以及最新的聚类算法(亲和力传播和密度峰值)进行了比较。我们还将聚类算法应用于移动位置聚类,构建用于图像识别的可视词典以及对数据流进行聚类。我们的实验表明,对于所有比较流行的聚类算法而言,我们的算法都是更好的替代方法,尤其是在效率是首要考虑因素的情况下,即,我们显着改善了时间和空间复杂度,但保持了相同的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号