首页>
外文OA文献
>An algorithm for deciding the number of clusters and validation using
simulated data with application to exploring crop population structure
【2h】
An algorithm for deciding the number of clusters and validation using
simulated data with application to exploring crop population structure
A first step in exploring population structure in crop plants and otherorganisms is to define the number of subpopulations that exist for a given dataset. The genetic marker data sets being generated have become increasinglylarge over time and commonly are of the high-dimension, low sample size (HDLSS)situation. An algorithm for deciding the number of clusters is proposed, and isvalidated on simulated data sets varying in both the level of structure and thenumber of clusters covering the range of variation observed empirically. Thealgorithm was then tested on six empirical data sets across three small grainspecies. The algorithm uses bootstrapping, three methods of clustering, anddefines the optimum number of clusters based on a common criterion, theHubert's gamma statistic. Validation on simulated sets coupled with testing onempirical sets suggests that the algorithm can be used for a wide variety ofgenetic data sets.
展开▼