...
首页> 外文期刊>Journal of information and computational science >An Improved Density Biased Sampling Algorithm for Clustering Large-scale Datasets
【24h】

An Improved Density Biased Sampling Algorithm for Clustering Large-scale Datasets

机译:一种用于大规模数据集聚类的改进的密度有偏采样算法

获取原文
获取原文并翻译 | 示例

摘要

As one of the most popular reduction methods of large-scale data mining, simple random sampling usually causes the loss of small clusters when dealing with unevenly distributed datasets. A density biased sampling algorithm based on grid can avoid this problem. However, both the efficiency and the effectiveness are restricted by grid granularity. To overcome such drawbacks, a density biased sampling algorithm based on variable grid division was proposed. Each dimension of original dataset is divided according to the corresponding distribution. And the structure of the generated grid could match the distribution of original dataset. Experimental results demonstrate that density biased sampling based on variable grid division can achieve higher quality than simple random sampling and consumes less sampling time comparing with the density biased sampling algorithm based on grid.
机译:作为大规模数据挖掘中最流行的归约方法之一,简单的随机采样通常在处理分布不均匀的数据集时会导致小聚类的丢失。基于网格的密度偏差采样算法可以避免此问题。但是,效率和有效性都受到网格粒度的限制。为了克服这些缺点,提出了一种基于可变网格划分的密度偏差采样算法。原始数据集的每个维度均根据相应的分布进行划分。并且生成的网格的结构可以匹配原始数据集的分布。实验结果表明,与基于网格的密度偏差采样算法相比,基于可变网格划分的密度偏差采样具有更高的质量,且采样时间更少。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号