首页> 外文期刊>Marine Mammal Science >Diagnosability of mtDNA with Random Forests: Using sequence data to delimit subspecies
【24h】

Diagnosability of mtDNA with Random Forests: Using sequence data to delimit subspecies

机译:MTDNA与随机林的诊断:使用序列数据分隔亚种

获取原文
获取原文并翻译 | 示例
       

摘要

We examine the use of an ensemble method, Random Forests, to delimit subspecies using mitochondrial DNA (mtDNA) sequences. Diagnosability, a measure of the ability to correctly determine the taxon of a specimen of unknown origin, has historically been used to delimit subspecies, but few studies have explored how to estimate it from DNA sequences. Using simulated and empirical data sets, we demonstrate that Random Forests produces classification models that perform well for diagnosing subspecies and species. Populations with strong social structure and relatively low abundances (e.g., killer whales, Orcinus orca) were found to be as diagnosable as species. Conversely, comparisons involving subspecies that are abundant (e.g., spinner and spotted dolphins, Stenella longirostris and S. attenuata), are only as diagnosable as many population comparisons. Estimates of diagnosability reported in subspecies and species descriptions should include confidence intervals, which are influenced by the sample sizes of the training data. We also stress the importance of reporting the certainty with which individuals in the training data are classified in order to communicate the strength of the classification model and diagnosability estimate. Guidance as to ideal minimum diagnosability thresholds for subspecies will improve with more comprehensive analyses; however, values in the range of 80%-90% are considered appropriate.
机译:我们研究了使用集合方法,随机森林,使用线粒体DNA(MTDNA)序列分隔亚种。诊断性,衡量正确确定未知起源标本的分类能力的衡量标本,历来被用来分隔亚种,但很少有研究探索了如何从DNA序列估算它。使用模拟和经验数据集,我们证明随机林产生了对诊断亚种和物种的良好的分类模型。发现具有强大社会结构和相对较低的丰富(例如,杀手鲸,Orcinus Orca)的人群作为物种诊断。相反,涉及丰富的亚种的比较(例如,旋转器和斑点的海豚,Stenella longirostris和S. Attenuata)仅作为诊断的人口比较。亚种和物种描述中报告的诊断性估计应包括置信区间,受培训数据的样本规模的影响。我们还强调了报告培训数据中的个人的重要性,以便传达分类模型和诊断估计的强度。对亚种的理想最低诊断性阈值的指导将改善更全面的分析;然而,80%-90%范围内的值被认为是合适的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号