首页> 外文期刊>Advances in Data Analysis and Classification >Model-based clustering of probability density functions
【24h】

Model-based clustering of probability density functions

机译:基于模型的概率密度函数聚类

获取原文
获取原文并翻译 | 示例

摘要

Complex data such as those where each statistical unit under study is described not by a single observation (or vector variable), but by a unit-specific sample of several or even many observations, are becoming more and more popular. Reducing these sample data by summary statistics, like the average or the median, implies that most inherent information (about variability, skewness or multi-modality) gets lost. Full information is preserved only if each unit is described by a whole distribution. This new kind of data, a.k.a. “distribution-valued data”, require the development of adequate statistical methods. This paper presents a method to group a set of probability density functions (pdfs) into homogeneous clusters, provided that the pdfs have to be estimated nonparametrically from the unit-specific data. Since elements belonging to the same cluster are naturally thought of as samples from the same probability model, the idea is to tackle the clustering problem by defining and estimating a proper mixture model on the space of pdfs. The issue of model building is challenging here because of the infinite-dimensionality and the non-Euclidean geometry of the domain space. By adopting a wavelet-based representation for the elements in the space, the task is accomplished by using mixture models for hyper-spherical data. The proposed solution is illustrated through a simulation experiment and on two real data sets.
机译:诸如那些不是通过单个观测值(或矢量变量)描述正在研究的每个统计单位,而是通过多个或什至很多观测值的单位特定样本来描述的复杂数据,正变得越来越流行。通过摘要统计(例如平均值或中位数)来减少这些样本数据,意味着大多数固有信息(关于变异性,偏度或多模态)会丢失。仅当每个单元由一个整体分布描述时,完整信息才会保留。这种新的数据,也称为“分布值数据”,需要开发适当的统计方法。本文提出了一种将一组概率密度函数(pdfs)分组为同质簇的方法,前提是必须从单位特定数据中非参数地估计pdf。由于属于同一聚类的元素自然被视为来自同一概率模型的样本,因此该想法是通过在pdf的空间上定义和估计适当的混合模型来解决聚类问题。由于域空间的无限维和非欧几里得几何形状,因此模型构建的问题在此具有挑战性。通过对空间中的元素采用基于小波的表示形式,可通过对超球面数据使用混合模型来完成此任务。通过仿真实验并在两个真实数据集上说明了所提出的解决方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号