首页> 外文OA文献 >An algorithm for deciding the number of clusters and validation using simulated data with application to exploring crop population structure
【2h】

An algorithm for deciding the number of clusters and validation using simulated data with application to exploring crop population structure

机译:一种用于确定簇数和验证的算法   模拟数据应用于探索作物种群结构

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

A first step in exploring population structure in crop plants and otherorganisms is to define the number of subpopulations that exist for a given dataset. The genetic marker data sets being generated have become increasinglylarge over time and commonly are of the high-dimension, low sample size (HDLSS)situation. An algorithm for deciding the number of clusters is proposed, and isvalidated on simulated data sets varying in both the level of structure and thenumber of clusters covering the range of variation observed empirically. Thealgorithm was then tested on six empirical data sets across three small grainspecies. The algorithm uses bootstrapping, three methods of clustering, anddefines the optimum number of clusters based on a common criterion, theHubert's gamma statistic. Validation on simulated sets coupled with testing onempirical sets suggests that the algorithm can be used for a wide variety ofgenetic data sets.
机译:探索农作物和其他生物的种群结构的第一步是定义给定数据集存在的亚种群的数量。随着时间的推移,生成的遗传标记数据集变得越来越大,并且通常具有高维,低样本量(HDLSS)的情况。提出了一种确定簇数的算法,并在结构水平和簇数均变化的模拟数据集上进行了验证,该数据集涵盖了经验观察到的变化范围。然后,在三个小型谷物的六个经验数据集上测试了算法。该算法使用自举(bootstrapping),三种聚类方法,并基于通用准则(休伯特伽玛统计量)定义最佳聚类数。对模拟集进行验证并结合对经验集进行测试表明,该算法可用于多种遗传数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号