Model-based clustering of multivariate ordinal data relying on a stochastic binary search algorithm

Biernacki Christophe; Jacques Julien

首页> 外文期刊>Statistics and computing >Model-based clustering of multivariate ordinal data relying on a stochastic binary search algorithm

【24h】

Model-based clustering of multivariate ordinal data relying on a stochastic binary search algorithm

机译：基于模型的随机序搜索算法对多元序数数据的聚类

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We design a probability distribution for ordinal data by modeling the process generating data, which is assumed to rely only on order comparisons between categories. Contrariwise, most competitors often either forget the order information or add a non-existent distance information. The data generating process is assumed, from optimality arguments, to be a stochastic binary search algorithm in a sorted table. The resulting distribution is natively governed by two meaningful parameters (position and precision) and has very appealing properties: decrease around the mode, shape tuning from uniformity to a Dirac, identifiability. Moreover, it is easily estimated by an EM algorithm since the path in the stochastic binary search algorithm can be considered as missing values. Using then the classical latent class assumption, the previous univariate ordinal model is straightforwardly extended to model-based clustering for multivariate ordinal data. Parameters of this mixture model are estimated by an AECM algorithm. Both simulated and real data sets illustrate the great potential of this model by its ability to parsimoniously identify particularly relevant clusters which were unsuspected by some traditional competitors.

机译：我们通过对生成数据的过程进行建模来设计序数数据的概率分布，假定该过程仅依赖于类别之间的顺序比较。相反，大多数竞争者通常会忘记订单信息或添加不存在的距离信息。根据最佳参数，假定数据生成过程是排序表中的随机二进制搜索算法。所得的分布本机由两个有意义的参数（位置和精度）控制，并具有非常吸引人的属性：围绕模式的减小，从均匀性到Dirac的形状调整，可识别性。此外，由于随机二分查找算法中的路径可以视为缺失值，因此可以通过EM算法轻松估算。使用经典的潜在类假设，先前的单变量序数模型可以直接扩展到多变量序数数据的基于模型的聚类。该混合模型的参数通过AECM算法估算。模拟数据集和真实数据集都通过其简约地识别出某些传统竞争对手未曾怀疑的特别相关的集群的能力，说明了该模型的巨大潜力。

著录项

来源
《Statistics and computing》 |2016年第5期|929-943|共15页
作者
Biernacki Christophe; Jacques Julien;
展开▼
作者单位

Univ Lille 1, Lab Painleve, F-59655 Villeneuve Dascq, France|Univ Lille 1, Inria, F-59655 Villeneuve Dascq, France;

Univ Lyon 2, Lab ERIC, F-69676 Bron, France;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Ordinal data; Binary search algorithm; Latent variables; AECM algorithm;

机译：有序数据二进制搜索算法潜在变量AECM算法;

相似文献

外文文献
中文文献
专利

1. In search of optimal centroids on data clustering using a binary search algorithm [J] . Abdolreza Hatamlou Pattern recognition letters . 2012,第13期

机译：使用二进制搜索算法在数据聚类上搜索最佳质心
2. Bayesian model determination for multivariate ordinal and binary data [J] . Emily L. Webb, Jonathan J. Forster Computational statistics & data analysis . 2008,第5期

机译：多元有序和二进制数据的贝叶斯模型确定
3. Bayesian model-based clustering for longitudinal ordinal data [J] . Costilla Roy, Liu Ivy, Arnold Richard, Computational statistics . 2019,第3期

机译：基于贝叶斯模型的纵向序列数据集群
4. A Model-Based Multivariate Time Series Clustering Algorithm [C] . Pei-Yuan Zhou, Keith C.C. Chan Pacific-Asia conference on knowledge discovery and data mining;International workshop on biologically inspired data mining techniques;International workshop on data analytics for targeted healthcare;International workshop on big data science and engineering on e-commerce;International workshop on mobile data management, mining and computing on social networks;International workshop on mobile sensing, mining and visualization for human behavior inferences;International workshop on cloud service discovery;International conference on algorithms for large-scale information processing in knowledge discovery;International workshop on scalable dats analytics: thoery and algorithms;nternational workshop on data mining in biomedical informatics and healthcare;International workshop on data mining in social networks;International workshop on pattern mining and application of big data . 2014

机译：基于模型的多元时间序列聚类算法
5. Supervised precision ordinal clustering – A human-machine learning algorithm to create accurate clusters in big datasets: Application to indiana water quality data with novel visualization techniques [D] . Singh, Sarabjit 2014

机译：有监督的有序序数聚类–一种人机学习算法，可在大型数据集中创建准确的聚类：采用新颖的可视化技术应用于印第安纳州水质数据
6. A model-based circular binary segmentation algorithm for the analysis of array CGH data [O] . Fang-Han Hsu, Hung-I H Chen, Mong-Hsun Tsai, 2011

机译：基于模型的圆形二进制分割算法用于阵列CGH数据分析
7. Model-based clustering of multivariate ordinal data relying on a stochastic binary search algorithm [O] . Christophe Biernacki, Julien Jacques 2015

机译：基于模型的多元序数据依赖于随机二进制搜索算法的多变量序列数据
8. Multi-Parent Clustering Algorithms from Stochastic Grammar Data Models [R] . Mjoisness, Eric, Castano, Rebecca, Gray, Alexander 1999

机译：随机语法数据模型的多父聚类算法

Model-based clustering of multivariate ordinal data relying on a stochastic binary search algorithm

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅