An Improved Density Biased Sampling Algorithm for Clustering Large-scale Datasets

Xuezhong Qian; Kaiyuan Sheng; Heng Qian; Qin Wu

首页> 外文期刊>Journal of information and computational science >An Improved Density Biased Sampling Algorithm for Clustering Large-scale Datasets

【24h】

An Improved Density Biased Sampling Algorithm for Clustering Large-scale Datasets

机译：一种用于大规模数据集聚类的改进的密度有偏采样算法

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

As one of the most popular reduction methods of large-scale data mining, simple random sampling usually causes the loss of small clusters when dealing with unevenly distributed datasets. A density biased sampling algorithm based on grid can avoid this problem. However, both the efficiency and the effectiveness are restricted by grid granularity. To overcome such drawbacks, a density biased sampling algorithm based on variable grid division was proposed. Each dimension of original dataset is divided according to the corresponding distribution. And the structure of the generated grid could match the distribution of original dataset. Experimental results demonstrate that density biased sampling based on variable grid division can achieve higher quality than simple random sampling and consumes less sampling time comparing with the density biased sampling algorithm based on grid.

机译：作为大规模数据挖掘中最流行的归约方法之一，简单的随机采样通常在处理分布不均匀的数据集时会导致小聚类的丢失。基于网格的密度偏差采样算法可以避免此问题。但是，效率和有效性都受到网格粒度的限制。为了克服这些缺点，提出了一种基于可变网格划分的密度偏差采样算法。原始数据集的每个维度均根据相应的分布进行划分。并且生成的网格的结构可以匹配原始数据集的分布。实验结果表明，与基于网格的密度偏差采样算法相比，基于可变网格划分的密度偏差采样具有更高的质量，且采样时间更少。

著录项

来源
《Journal of information and computational science 》 |2014年第7期| 2355-2364| 共10页
作者
Xuezhong Qian; Kaiyuan Sheng; Heng Qian; Qin Wu;
展开▼
作者单位

School of Internet of Things, Jiangnan University, Wuxi 214122, China;

School of Internet of Things, Jiangnan University, Wuxi 214122, China;

School of Information Engineering, Yangzhou University, Yangzhou 225127, China;

School of Internet of Things, Jiangnan University, Wuxi 214122, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Density Biased Sampling; Variable Grid Division; Data Mining; Large-scale Dataset; Clustering;

机译：密度偏置采样;可变网格划分;数据挖掘;大规模数据集;聚类;

相似文献

外文文献
中文文献
专利

1. An Efficient Density Biased Sampling Algorithm for Clustering Large High-Dimensional Datasets [J] . Qian Xue-Zhong, Deng Jie International Journal of Pattern Recognition and Artificial Intelligence . 2015 ,第8期

机译：大型高维数据集聚的一种有效的密度有偏采样算法
2. An improved density-based single sliding clustering algorithm for large datasets in the cultural information system [J] . Tolba Amr, Al-Makhadmeh Zafer Personal and Ubiquitous Computing . 2020 ,第1期

机译：一种改进的基于密度的单滑动聚类算法，用于文化信息系统中的大型数据集
3. Adaptive weighted over-sampling for imbalanced datasets based on density peaks clustering with heuristic filtering [J] . Information Sciences: An International Journal . 2020 ,第期

机译：基于密度峰值聚类与启发式滤波的自适应加权过度采样
4. An improved density-based cluster analysis method combining genetic algorithm and data sampling for large-scale datasets [C] . Ye Zonglin, Cao Hui, Wang Miaomiao, Chinese Control Conference . 2013

机译：结合遗传算法和数据采样的大规模数据集改进的基于密度的聚类分析方法
5. Supervised precision ordinal clustering – A human-machine learning algorithm to create accurate clusters in big datasets: Application to indiana water quality data with novel visualization techniques [D] . Singh, Sarabjit 2014

机译：有监督的有序序数聚类–一种人机学习算法，可在大型数据集中创建准确的聚类：采用新颖的可视化技术应用于印第安纳州水质数据
6. A multiple kernel density clustering algorithm for incomplete datasets in bioinformatics [O] . Longlong Liao, Kenli Li, Keqin Li, 2018

机译：生物信息学中不完整数据集的多核密度聚类算法
7. An Improved Semi-Supervised Clustering Algorithm for Multi-Density Datasets with Fewer Constraints [O] . Chen Xiaoyun, Liu Sha, Chen Tao, 2012

机译：约束少的多密度数据集的改进半监督聚类算法
8. Density Biased Sampling: An Improved Method for Data Mining and Clustering [R] . Palmer, C. R. , Faloutsos, C. 1999

机译：密度偏差抽样：一种改进的数据挖掘和聚类方法

An Improved Density Biased Sampling Algorithm for Clustering Large-scale Datasets

摘要

著录项

相似文献

相关主题

期刊订阅