VALIDATION OF HIERARCHICAL CLASSIFICATIONS BY SPLITTING DATASET

Z. BOTTA-DUKAT

首页> 外文期刊>Acta Botanica Hungarica >VALIDATION OF HIERARCHICAL CLASSIFICATIONS BY SPLITTING DATASET

【24h】

VALIDATION OF HIERARCHICAL CLASSIFICATIONS BY SPLITTING DATASET

机译：通过分割数据集验证层次分类

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Whenever we make any kind of ecological study it is obvious that a sample is analysed since we are not able to measure the whole statistic population. Numerical classification in general is a useful tool to explore the structure of different kinds of ecological data, but it reflects the structure of the studied dataset (the sample). However, we are interested in the structure of the statistical population from which the sample is derived. It is possible that among the clusters gained by the classification there are some, which are representative only for the sample and not for the whole statistical population, thus these clusters can be called "artificial". This paper describes a method that helps us to avoid the interpretation of these "artificial" clusters, which are characteristic only for the sample, not for entire population. The method is called validation, because its steps are similar to validation used in other fields of numerical analysis. In case of cluster analysis the definitive characteristics of the particular clusters are unknown. This means that it is not possible to make testable hypothesis based on the results of the cluster analysis. Therefore, the method proposed here does not compare the clusters themselves, but the "meaning" of the clusters; i.e. their characteristics that are used for the interpretation of the results. Frequency of species was chosen as "meaning" of clusters here, but using other characteristics, e.g. mean or median for continuous variables is also possible. The new 'methods are applied to an artificial data set to illustrate the procedure and to show its merits.

机译：每当我们进行任何类型的生态研究时，显然都将分析样本，因为我们无法测量整个统计总体。通常，数值分类是探索各种生态数据结构的有用工具，但它反映了所研究数据集（样本）的结构。但是，我们对样本所源自的统计总体的结构感兴趣。在通过分类获得的聚类中，可能有一些仅代表样本而不代表整个统计群体，因此这些聚类可以称为“人工”。本文介绍了一种方法，该方法可帮助我们避免解释这些“人工”簇，这些簇仅针对样本而非整个人群才具有特征。该方法称为验证，因为其步骤类似于数值分析其他领域中使用的验证。在聚类分析的情况下，特定聚类的确定特征是未知的。这意味着不可能基于聚类分析的结果做出可检验的假设。因此，这里提出的方法不是比较聚类本身，而是比较聚类的“含义”。即用于解释结果的特征。物种的频率在这里被选择为集群的“含义”，但是使用了其他特征，例如连续变量的均值或中位数也是可能的。新的“方法”被应用于人工数据集，以说明该过程并显示其优点。

著录项

来源
《Acta Botanica Hungarica》 |2008年第2期|共8页
作者
Z. BOTTA-DUKAT;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类植物学;
关键词
classification; validation; data splitting;

机译：分类;验证;数据分割;

相似文献

外文文献
中文文献
专利

1. VALIDATION OF HIERARCHICAL CLASSIFICATIONS BY SPLITTING DATASET [J] . Z. BOTTA-DUKAT Acta Botanica Hungarica . 2008,第1a2期

机译：通过分割数据集验证层次分类
2. An empirical evaluation of hierarchical feature selection methods for classification in bioinformatics datasets with gene ontology-based features [J] . Wan Cen, Freitas Alex A. Artificial Intelligence Review: An International Science and Engineering Journal . 2018,第2期

机译：基于基于基于基于基于基于基于基于基于基于基于基于基于基于基于基于基于基于基于基于词组的分层特征选择方法的实证评估
3. A methodology for classification and validation of customer datasets [J] . Nie Dongyun, Cappellari Paolo, Roantree Mark Journal of business & industrial marketing . 2021,第5期

机译：客户数据集分类和验证的方法
4. Analysis of k-Fold Cross-Validation over Hold-Out Validation on Colossal Datasets for Quality Classification [C] . Sanjay Yadav, Sanyam Shukla . 2016

机译：对用于质量分类的巨大数据集进行保留验证的k折交叉验证分析
5. Exploring differences in multivariate datasets using hierarchies an interactive information visualization approach. [D] . Guerra Gomez, John Alexis. 2013

机译：使用层次结构探索多元数据集中的差异是一种交互式信息可视化方法。
6. Revisiting the CompCars Dataset for Hierarchical Car Classification: New Annotations Experiments and Results [O] . Marco Buzzelli, Luca Segantin 2021

机译：重新审视Compcars DataSet进行分层汽车分类：新的注释实验和结果
7. VALIDATION OF HIERARCHICAL CLASSIFICATIONS BY SPLITTING DATASET [O] . Z. Botta-dukát 2007

机译：通过分割数据集确定层次分类

VALIDATION OF HIERARCHICAL CLASSIFICATIONS BY SPLITTING DATASET

摘要

著录项

相似文献

相关主题

期刊订阅