首页> 外文期刊>Acta Botanica Hungarica >VALIDATION OF HIERARCHICAL CLASSIFICATIONS BY SPLITTING DATASET
【24h】

VALIDATION OF HIERARCHICAL CLASSIFICATIONS BY SPLITTING DATASET

机译:通过分割数据集验证层次分类

获取原文
获取原文并翻译 | 示例
           

摘要

Whenever we make any kind of ecological study it is obvious that a sample is analysed since we are not able to measure the whole statistic population. Numerical classification in general is a useful tool to explore the structure of different kinds of ecological data, but it reflects the structure of the studied dataset (the sample). However, we are interested in the structure of the statistical population from which the sample is derived. It is possible that among the clusters gained by the classification there are some, which are representative only for the sample and not for the whole statistical population, thus these clusters can be called "artificial". This paper describes a method that helps us to avoid the interpretation of these "artificial" clusters, which are characteristic only for the sample, not for entire population. The method is called validation, because its steps are similar to validation used in other fields of numerical analysis. In case of cluster analysis the definitive characteristics of the particular clusters are unknown. This means that it is not possible to make testable hypothesis based on the results of the cluster analysis. Therefore, the method proposed here does not compare the clusters themselves, but the "meaning" of the clusters; i.e. their characteristics that are used for the interpretation of the results. Frequency of species was chosen as "meaning" of clusters here, but using other characteristics, e.g. mean or median for continuous variables is also possible. The new 'methods are applied to an artificial data set to illustrate the procedure and to show its merits.
机译:每当我们进行任何类型的生态研究时,显然都将分析样本,因为我们无法测量整个统计总体。通常,数值分类是探索各种生态数据结构的有用工具,但它反映了所研究数据集(样本)的结构。但是,我们对样本所源自的统计总体的结构感兴趣。在通过分类获得的聚类中,可能有一些仅代表样本而不代表整个统计群体,因此这些聚类可以称为“人工”。本文介绍了一种方法,该方法可帮助我们避免解释这些“人工”簇,这些簇仅针对样本而非整个人群才具有特征。该方法称为验证,因为其步骤类似于数值分析其他领域中使用的验证。在聚类分析的情况下,特定聚类的确定特征是未知的。这意味着不可能基于聚类分析的结果做出可检验的假设。因此,这里提出的方法不是比较聚类本身,而是比较聚类的“含义”。即用于解释结果的特征。物种的频率在这里被选择为集群的“含义”,但是使用了其他特征,例如连续变量的均值或中位数也是可能的。新的“方法”被应用于人工数据集,以说明该过程并显示其优点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号