Statistical approach to normalization of feature vectors and clustering of mixed datasets

Suarez-Alvarez M.M.; Pham D.-T.; Prostov M.Y.; Prostov Y.I.

首页> 外文期刊>Proceedings of the Royal Society. Mathematical, physical and engineering sciences >Statistical approach to normalization of feature vectors and clustering of mixed datasets

【24h】

Statistical approach to normalization of feature vectors and clustering of mixed datasets

机译：统计特征向量标准化和混合数据集聚的方法

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Normalization of feature vectors of datasets is widely used in a number of fields of data mining, in particular in cluster analysis, where it is used to prevent features with large numerical values from dominating in distance-based objective functions. In this study, a unified statistical approach to normalization of all attributes of mixed databases, when different metrics are used for numerical and categorical data, is proposed. After the proposed normalization, the contributions of both numerical and categorical attributes to a specified objective function are statistically the same. Formulae for the statistically normalized Minkowski mixed p-metrics are given in an explicit way. It is shown that the classic z-score standardization and the min-max normalization are particular cases of the statistical normalization, when the objective function is, respectively, based on the Euclidean or the Tchebycheff (Chebyshev) metrics. Finally, clustering of several benchmark datasets is performed with non-normalized and introduced normalized mixed metrics using either the k-prototypes (for p =2) or another algorithm (for p =2).

机译：数据集特征向量的规范化被广泛用于许多数据挖掘领域，尤其是在聚类分析中，在聚类分析中，它用于防止具有较大数值的特征在基于距离的目标函数中占主导地位。在这项研究中，提出了一种统一的统计方法，当数值和分类数据使用不同的度量标准时，可以标准化混合数据库的所有属性。在建议的归一化之后，数值和分类属性对指定目标函数的贡献在统计上是相同的。以明确的方式给出了统计归一化的Minkowski混合p-度量的公式。结果表明，当目标函数分别基于欧几里得或切比雪夫（Chebyshev）度量标准时，经典z分数标准化和最小-最大标准化是统计标准化的特殊情况。最后，使用k原型（对于p = 2）或另一种算法（对于p = 2），使用非归一化和引入归一化混合度量对几个基准数据集进行聚类。

著录项

来源
《Proceedings of the Royal Society. Mathematical, physical and engineering sciences》 |2012年第2145期|共22页
作者
Suarez-Alvarez M.M.; Pham D.-T.; Prostov M.Y.; Prostov Y.I.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类数学;
关键词
Clustering; Minkowski metrics; Normalization; Standardization; Statistics;

机译：聚类;Minkowski度量;归一化;标准化;统计;

相似文献

外文文献
中文文献
专利

1. Statistical approach to normalization of feature vectors and clustering of mixed datasets [J] . Suarez-Alvarez M.M., Pham D.-T., Prostov M.Y., Proceedings of the Royal Society. Mathematical, physical and engineering sciences . 2012,第2145期

机译：统计特征向量标准化和混合数据集聚的方法
2. Reduced Support Vector Machine Based on Nonhierarchical Clustering Techniques for Classifying Mixed Large-Scale Datasets [J] . S. Andari, S.W. Purnami International Journal of Applied Mathematics & Statistics . 2015,第5期

机译：基于非分层聚类技术的简化支持向量机用于混合大型数据集分类
3. Properties of the sample estimators used for statistical normalization of feature vectors [J] . Prostov Mikhail Y., Suarez-Alvarez Maria M., Prostov Yuriy I. Data mining and knowledge discovery . 2015,第6期

机译：用于特征向量的统计归一化的样本估计量的属性
4. RJMCMC learning for clustering and feature selection of L2-normalized vectors [C] . Ola Amayri, Nizar Bouguila International Conference on Control, Decision and Information Technologies . 2016

机译：RJMCMC学习用于L2归一化向量的聚类和特征选择
5. A generalized approach for calculation of the eigenvector sensitivity for various eigenvector normalizations. [D] . Siddhi, Vijendra. 2005

机译：一种针对各种特征向量归一化计算特征向量灵敏度的通用方法。
6. Scan-Statistic Approach Identifies Clusters of Rare Disease Variants in LRP2 a Gene Linked and Associated with Autism Spectrum Disorders in Three Datasets [O] . Iuliana Ionita-Laza, Vlad Makarov, Joseph D. Buxbaum 2012

机译：扫描统计方法在三个数据集中识别LRP2中的罕见疾病变异群LRP2是与自闭症谱系障碍相关联的基因
7. Statistical approach to normalization of feature vectors and clustering of mixed datasets [O] . Maria M. Suarez-Alvarez, Duc-Truong Pham, Mikhail Y. Prostov, 2012

机译：特征向量标准化的统计方法和混合数据集的聚类

Statistical approach to normalization of feature vectors and clustering of mixed datasets

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅