AN ALGORITHM FOR DECIDING THE NUMBER OF CLUSTERS AND VALIDATION USING SIMULATED DATA WITH APPLICATION TO EXPLORING CROP POPULATION STRUCTURE

Mark A. Newell; Dianne Cook; Heike Hofmann; Jean-Luc Jannink

首页> 外文期刊>The Annals of applied statistics >AN ALGORITHM FOR DECIDING THE NUMBER OF CLUSTERS AND VALIDATION USING SIMULATED DATA WITH APPLICATION TO EXPLORING CROP POPULATION STRUCTURE

【24h】

AN ALGORITHM FOR DECIDING THE NUMBER OF CLUSTERS AND VALIDATION USING SIMULATED DATA WITH APPLICATION TO EXPLORING CROP POPULATION STRUCTURE

机译：利用模拟数据确定群体数量和验证的算法及其在作物种群结构探索中的应用

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

A first step in exploring population structure in crop plants and other organisms is to define the number of subpopulations that exist for a given data set. The genetic marker data sets being generated have become increasingly large over time and commonly are of the high-dimension,low sample size (HDLSS) situation. An algorithm for deciding the number of clusters is proposed, and is validated on. simulated data sets varying in both the level of structure and the number of clusters covering the range of variation observed empirically. The algorithm was then tested on six empirical data sets across three small grain species. The algorithm uses bootstrapping, three methods of clustering, and defines the optimum number of clusters based on a common criterion,the Hubert's gamma statistic. Validation on simulated sets coupled with testing on empirical sets suggests that the algorithm can be used for a wide variety of genetic data sets.

机译：探索农作物和其他生物的种群结构的第一步是定义给定数据集存在的亚种群数量。随着时间的推移，生成的遗传标记数据集变得越来越大，并且通常具有高维，低样本量（HDLSS）的情况。提出了确定簇数的算法，并对其进行了验证。模拟数据集的结构水平和簇数均发生变化，这些变化涵盖了根据经验观察到的变化范围。然后在横跨三个小颗粒物种的六个经验数据集上对该算法进行了测试。该算法使用自举（bootstrapping），三种聚类方法，并基于通用准则（休伯特伽玛统计量）定义最佳聚类数。对模拟集进行验证以及对经验集进行测试表明，该算法可用于多种遗传数据集。

著录项

来源
《The Annals of applied statistics》 |2013年第4期|共19页
作者
Mark A. Newell; Dianne Cook; Heike Hofmann; Jean-Luc Jannink;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类高等数学;
关键词
Cluster analysis; high dimensional; low sample size; simulation; genetic marker data; visualization; bootstrap; dimension reduction;

机译：聚类分析;高维;低样本量;模拟;遗传标记数据;可视化;引导程序;降维;

相似文献

外文文献
中文文献
专利

1. 印度Ladakh地区斑头雁的数量、种群结构和栖息地利用 [J] . Herbert H. T. PRINS, Sipke E. van WIEREN 动物学报（英文版） . 2004,第005期
2. AN ALGORITHM FOR DECIDING THE NUMBER OF CLUSTERS AND VALIDATION USING SIMULATED DATA WITH APPLICATION TO EXPLORING CROP POPULATION STRUCTURE [J] . Mark A. Newell, Dianne Cook, Heike Hofmann, The Annals of applied statistics . 2013,第4期

机译：利用模拟数据确定群体数量和验证的算法及其在作物种群结构探索中的应用
3. Validating module network learning algorithms using simulated data [J] . Tom Michoel, Steven Maere, Eric Bonnet, BMC Bioinformatics . 2007,第SUPPLEMENTa2期

机译：使用模拟数据验证模块网络学习算法
4. Clustering Algorithms: On Learning, Validation, Performance, and Applications to Genomics [J] . Lori Dalton, Virginia Ballarin, Marcel Brun Current genomics . 2009,第6期

机译：聚类算法：关于学习，验证，性能及其在基因组学中的应用
5. Exploring Big Data Clustering Algorithms for Internet of Things Applications [C] . Hind Bangui, Mouzhi Ge, Barbora Buhnova International Conference on Internet of Things, Big Data and Security . 2018

机译：探索大型数据聚类算法，以实现物联网应用
6. Clustering algorithms, classification algorithms and their applications in medical databases. [D] . Baddam, Sudheer R. 2005

机译：聚类算法，分类算法及其在医学数据库中的应用。
7. Clustering Algorithms: On Learning Validation Performance and Applications to Genomics [O] . Lori Dalton, Virginia Ballarin, Marcel Brun 2009

机译：聚类算法：关于学习验证性能及其在基因组学中的应用
8. An algorithm for deciding the number of clusters and validation using simulated data with application to exploring crop population structure [O] . Newell, Mark A., Cook, Dianne, Hofmann, Heike, 2014

机译：一种用于确定簇数和验证的算法模拟数据应用于探索作物种群结构

AN ALGORITHM FOR DECIDING THE NUMBER OF CLUSTERS AND VALIDATION USING SIMULATED DATA WITH APPLICATION TO EXPLORING CROP POPULATION STRUCTURE

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅