Data clustering using proximity matrices with missing values

Karimzadeh Samira; Olafsson Sigurdur

首页> 外文期刊>Expert Systems with Application >Data clustering using proximity matrices with missing values

【24h】

Data clustering using proximity matrices with missing values

机译：使用缺少值的接近度矩阵进行数据聚类

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In most applications of data clustering the input data includes vectors describing the location of each data point, from which distances between data points can be calculated and a proximity matrix constructed. In some applications, however, the only available input is the proximity matrix, that is, the distances between each pair of data point. Several clustering algorithms can still be applied, but if the proximity matrix has missing values no standard method is directly applicable. Imputation can be done to replace missing values, but most imputation methods do not apply when only the proximity matrix is available. As a partial solution to fill this gap, we propose the Proximity Matrix Completion (PMC) algorithm. This algorithm assumes that data is missing due to one of two reasons: complete dissimilarity or incomplete observations; and imputes values accordingly. To determine which case applies the data is modeled as a graph and a set of maximum cliques in the graph is found. Overlap between cliques then determines the case and hence the method of imputation for each missing data point. This approach is motivated by an application in plant breeding, where what is needed is to cluster new experimental seed varieties into sets of varieties that interact similarly to the environment, and this application is presented as a case study in the paper. The applicability, limitations and performance of the new algorithm versus other methods of imputation are further studied by applying it to datasets derived from three well-known test datasets. (C) 2019 Elsevier Ltd. All rights reserved.

机译：在数据聚类的大多数应用中，输入数据包括描述每个数据点位置的向量，从这些向量可以计算数据点之间的距离并构造邻近矩阵。但是，在某些应用中，唯一可用的输入是接近矩阵，即每对数据点之间的距离。仍然可以应用几种聚类算法，但是如果接近矩阵缺少值，则无法直接应用标准方法。可以使用插补来替换缺失值，但是当只有邻近矩阵可用时，大多数插补方法都不适用。作为填补这一空白的部分解决方案，我们提出了邻近矩阵完成（PMC）算法。该算法假定由于以下两个原因之一而导致数据丢失：完全不相似或不完全观察；并据此估算值。为了确定哪种情况，将数据建模为图形，并在图形中找到一组最大集团。群体之间的重叠然后确定情况，并因此确定每个丢失的数据点的插补方法。这种方法是由植物育种中的一种应用所激发的，该应用中需要将新的实验种子品种聚类为与环境具有相似相互作用的一组品种，本文以案例研究的形式介绍了这种应用。通过将新算法应用到源自三个著名测试数据集的数据集，进一步研究了该新算法相对于其他插补方法的适用性，局限性和性能。（C）2019 Elsevier Ltd.保留所有权利。

著录项

来源
《Expert Systems with Application》 |2019年第7期|265-276|共12页
作者
Karimzadeh Samira; Olafsson Sigurdur;
展开▼
作者单位

Iowa State Univ, Dept Ind & Mfg Syst Engn, 3004 Black Engn, Ames, IA 50011 USA;

Iowa State Univ, Dept Ind & Mfg Syst Engn, 3004 Black Engn, Ames, IA 50011 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Clustering; Imputation; Missing values; Proximity matrix;

机译：聚类;计算;缺失值;邻近矩阵;

相似文献

外文文献
中文文献
专利

1. Analysis of incomplete climate data: estimation of mean values and covariance matrices and imputation of missing values [J] . TAPIO SCHNEIDER Journal of Climate . 2001,第5期

机译：不完全气候数据分析：估计平均值和协方差矩阵及缺失值的归属
2. Imputation of missing values for semi-supervised data using the proximity in random forests [J] . Tsunenori Ishioka International Journal of Business Intelligence and Data Mining . 2013,第2期

机译：使用随机森林中的邻近度来估算半监督数据的缺失值
3. Missing precipitation data estimation using optimal proximity metric-based imputation, nearest-neighbour classification and cluster-based interpolation methods [J] . Ramesh S. V. Teegavarapu Hydrological sciences journal . 2014,第11a12期

机译：使用基于最佳接近度量的插值，最近邻分类和基于聚类的插值方法估算降水数据不足
4. Imputation of Missing Values for Unsupervised Data Using the Proximity in Random Forests [C] . Tsunenori Ishioka International Conference on Mobile, Hybrid, and On-Line Learning . 2013

机译：在随机林中使用邻近的无监督数据缺失值的归责
5. Computational tools for missing values in multivariate longitudinal and clustered data. [D] . Yucel, Recai Murat. 2000

机译：多元纵向和聚类数据中缺失值的计算工具。
6. Multiple Imputation based Clustering Validation (MIV) for Big Longitudinal Trial Data with Missing Values in eHealth [O] . Zhaoyang Zhang, Hua Fang, Honggang Wang -1

机译：eHealth中缺少值的大型纵向试验数据的基于多重归因的聚类验证（MIV）
7. Analysis of Incomplete Climate Data: Estimation of Mean Values and Covariance Matrices and Imputation of Missing Values [O] . Schneider Tapio 2001

机译：不完整的气候数据分析：均值和协方差矩阵的估计以及缺失值的估算

Data clustering using proximity matrices with missing values

摘要

著录项

相似文献

相关主题

期刊订阅