您现在的位置:首页> 外文会议>International Conference on Ubiquitous Information Management and Communication >文献详情

【24h】A Farthest First Traversal based Sampling Algorithm for k-clustering

机译基于最远第一遍历的k聚类采样算法

【摘要】 The farthest-first-traversal (fft) algorithm originally was used by Rosenkrantz et al. in an analysis of heuristics for the traveling salesman problem. This algorithm has been extensively studied for several sampling techniques. In this work, we present a modification of ProTraS algorithm given by Ros and Guillaume, which is also based on the fft algorithm, for sampling datasets for both k-means and k-median clustering algorithms. Unlike ProTraS, proposed algorithm takes the size of samples as an input. The algorithm is implemented in the Spark platform and tested for benchmark datasets. We also estimate the algorithm by comparing with the adaptive sampling and lightweight coreset algorithms, using the adjust Rand index.

【摘要机译】Rosenkrantz等人最初使用了最远的优先遍历(fft)算法。分析旅行商问题的启发式方法。对于几种采样技术,已经对该算法进行了广泛的研究。在这项工作中,我们提出了Ros和Guillaume给出的ProTraS算法的一种修改,它也是基于fft算法,用于对k均值和k中值聚类算法的数据集进行采样。与ProTraS不同,所提出的算法将样本大小作为输入。该算法在Spark平台中实现,并针对基准数据集进行了测试。我们还使用调整兰德指数,通过与自适应采样和轻量级核心集算法进行比较来估计算法。

【作者】Le Hong Trang; Nguyen Le Hoang; Tran Khanh Dang;

【作者单位】HCMC University of Technology Vietnam National University HCMC HCMC Vietnam;

【年(卷),期】2020(),

【年度】2020

【页码】1-6

【总页数】6

【正文语种】

【中图分类】;

【关键词】Clustering algorithms; Signal processing algorithms; Approximation algorithms; Indexes; Computational complexity; Benchmark testing; Noise measurement;

机译 聚类算法;信号处理算法;近似算法;索引;计算复杂度;基准测试;噪音测量;
  • 联系方式:010-58892860转803 (工作时间) 18141920177 (微信同号)
  • 客服邮箱:kefu@zhangqiaokeyan.com
  • 京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-1 六维联合信息科技(北京)有限公司©版权所有
  • 客服微信
  • 服务号