...
首页> 外文期刊>The Annals of applied statistics >AN ALGORITHM FOR DECIDING THE NUMBER OF CLUSTERS AND VALIDATION USING SIMULATED DATA WITH APPLICATION TO EXPLORING CROP POPULATION STRUCTURE
【24h】

AN ALGORITHM FOR DECIDING THE NUMBER OF CLUSTERS AND VALIDATION USING SIMULATED DATA WITH APPLICATION TO EXPLORING CROP POPULATION STRUCTURE

机译:利用模拟数据确定群体数量和验证的算法及其在作物种群结构探索中的应用

获取原文
获取原文并翻译 | 示例

摘要

A first step in exploring population structure in crop plants and other organisms is to define the number of subpopulations that exist for a given data set. The genetic marker data sets being generated have become increasingly large over time and commonly are of the high-dimension,low sample size (HDLSS) situation. An algorithm for deciding the number of clusters is proposed, and is validated on. simulated data sets varying in both the level of structure and the number of clusters covering the range of variation observed empirically. The algorithm was then tested on six empirical data sets across three small grain species. The algorithm uses bootstrapping, three methods of clustering, and defines the optimum number of clusters based on a common criterion,the Hubert's gamma statistic. Validation on simulated sets coupled with testing on empirical sets suggests that the algorithm can be used for a wide variety of genetic data sets.
机译:探索农作物和其他生物的种群结构的第一步是定义给定数据集存在的亚种群数量。随着时间的推移,生成的遗传标记数据集变得越来越大,并且通常具有高维,低样本量(HDLSS)的情况。提出了确定簇数的算法,并对其进行了验证。模拟数据集的结构水平和簇数均发生变化,这些变化涵盖了根据经验观察到的变化范围。然后在横跨三个小颗粒物种的六个经验数据集上对该算法进行了测试。该算法使用自举(bootstrapping),三种聚类方法,并基于通用准则(休伯特伽玛统计量)定义最佳聚类数。对模拟集进行验证以及对经验集进行测试表明,该算法可用于多种遗传数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号