首页> 美国政府科技报告 >Density Biased Sampling: An Improved Method for Data Mining and Clustering

【24h】

Density Biased Sampling: An Improved Method for Data Mining and Clustering

机译：密度偏差抽样：一种改进的数据挖掘和聚类方法

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Data mining in large data sets often requires a sampling or summarization step to form an in-core representation of the data that can be processed more efficiently. Uniform random sampling is frequently used in practice and also frequently criticized because it will miss small clusters. Many natural phenomena are known to follow Zipf's distribution and the inability of uniform sampling to find small clusters is of practical concern. Density Biased Sampling is proposed to probabilistically under-sample dense regions and over-sample light regions. A weighted sample is used to preserve the densities of the original data. Density biased sampling naturally includes uniform sampling as a special case. A memory efficient algorithm is proposed that approximates density biased sampling using only a single scan of the data. We empirically evaluate density biased sampling using synthetic data sets that exhibit varying cluster size distributions. Our proposed method scales linearly and out performs uniform samples when clustering realistic data sets.

著录项

作者
Palmer, C. R. ; Faloutsos, C.;
展开▼
作者单位

展开▼
年度 1999
页码 1-23
总页数 23
原文格式 PDF
正文语种 eng
中图分类工业技术;
关键词
Data bases; Data management; Algorithms; Statistical samples; Clustering; Bias;

机译：数据库;数据管理;算法;统计样本;聚类;偏差;

相似文献

外文文献
中文文献
专利

1. An Improved Density Biased Sampling Algorithm for Clustering Large-scale Datasets [J] . Xuezhong Qian, Kaiyuan Sheng, Heng Qian, Journal of information and computational science . 2014,第7期

机译：一种用于大规模数据集聚类的改进的密度有偏采样算法
2. An Efficient Density Biased Sampling Algorithm for Clustering Large High-Dimensional Datasets [J] . Qian Xue-Zhong, Deng Jie International Journal of Pattern Recognition and Artificial Intelligence . 2015,第8期

机译：大型高维数据集聚的一种有效的密度有偏采样算法
3. The density-based clustering method for privacy-preserving data mining [J] . Wu Jimmy Ming-Tai, Lin Jerry Chun-Wei, Fournier-Viger Philippe, Annals of the American Thoracic Society . 2019,第3期

机译：保留隐私数据挖掘的基于密度的聚类方法
4. Biased box sampling - a density-biased sampling for clustering [C] . Ana Paula Appel, Adriano Arantes Paterlini, Elaine P. M. de Sousa, ACM symposium on Applied computing . 2007

机译：偏向框抽样-用于聚类的密度偏向抽样
5. Statistical methods for failure time data with biased sampling and measurement errors. [D] . Cheng, Yu-Jen. 2009

机译：带有采样和测量误差的故障时间数据的统计方法。
6. Mapping Species Distributions with MAXENT Using a Geographically Biased Sample of Presence Data: A Performance Assessment of Methods for Correcting Sampling Bias [O] . Yoan Fourcade, Jan O. Engler, Dennis Rödder, -1

机译：使用地理偏置的存在数据样本通过MAXENT映射物种分布：校正采样偏差的方法的性能评估
7. The Study on Handling Sampling Weights Associated with the Survey Data When Applying Data Mining Methods——Based on the Method of Re-sampling with PPWWR [O] . 谢佳斌, 金勇进, 谢邦昌 2009

机译：应用数据挖掘方法处理与调查数据相关的采样权重的研究-基于PPWWR的重采样方法

Density Biased Sampling: An Improved Method for Data Mining and Clustering

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅