...
首页> 外文期刊>Ecology and Evolution >Robust and simplified machine learning identification of pitfall trap‐collected ground beetles at the continental scale
【24h】

Robust and simplified machine learning identification of pitfall trap‐collected ground beetles at the continental scale

机译:大陆尺度陷阱陷阱收集地甲虫的鲁棒和简化的机器学习识别

获取原文

摘要

Insect populations are changing rapidly, and monitoring these changes is essential for understanding the causes and consequences of such shifts. However, large‐scale insect identification projects are time‐consuming and expensive when done solely by human identifiers. Machine learning offers a possible solution to help collect insect data quickly and efficiently. Here, we outline a methodology for training classification models to identify pitfall trap‐collected insects from image data and then apply the method to identify ground beetles (Carabidae). All beetles were collected by the National Ecological Observatory Network (NEON), a continental scale ecological monitoring project with sites across the United States. We describe the procedures for image collection, image data extraction, data preparation, and model training, and compare the performance of five machine learning algorithms and two classification methods (hierarchical vs. single‐level) identifying ground beetles from the species to subfamily level. All models were trained using pre‐extracted feature vectors, not raw image data. Our methodology allows for data to be extracted from multiple individuals within the same image thus enhancing time efficiency, utilizes relatively simple models that allow for direct assessment of model performance, and can be performed on relatively small datasets. The best performing algorithm, linear discriminant analysis (LDA), reached an accuracy of 84.6% at the species level when naively identifying species, which was further increased to 95% when classifications were limited by known local species pools. Model performance was negatively correlated with taxonomic specificity, with the LDA model reaching an accuracy of ~99% at the subfamily level. When classifying carabid species not included in the training dataset at higher taxonomic levels species, the models performed significantly better than if classifications were made randomly. We also observed greater performance when classifications were made using the hierarchical classification method compared to the single‐level classification method at higher taxonomic levels. The general methodology outlined here serves as a proof‐of‐concept for classifying pitfall trap‐collected organisms using machine learning algorithms, and the image data extraction methodology may be used for nonmachine learning uses. We propose that integration of machine learning in large‐scale identification pipelines will increase efficiency and lead to a greater flow of insect macroecological data, with the potential to be expanded for use with other noninsect taxa.
机译:昆虫种群正在迅速变化,监测这些变化对于了解这种转变的原因和后果至关重要。然而,在人类标识符完全完成时,大型昆虫识别项目是耗时和昂贵的。机器学习提供了可能的解决方案,可以快速有效地帮助收集昆虫数据。在这里,我们概述了一种培训分类模型的方法,以识别来自图像数据的陷阱陷阱收集的昆虫,然后应用该方法识别甲虫(Carabidae)。所有甲虫由国家生态天文台网络(霓虹灯)收集,这是一个美国围绕美国的大陆级生态监测项目。我们描述了图像集合,图像数据提取,数据准备和模型培训的程序,并比较了五种机器学习算法的性能和两种分类方法(分层与单级)从物种到亚家族级别识别地甲虫。所有型号均使用预提取的特征向量培训,而不是原始图像数据。我们的方法允许从相同图像内的多个单独提取的数据,从而提高时间效率,利用相对简单的模型来实现模型性能的直接评估,并且可以在相对较小的数据集上执行。当天鹅识别物种时,最佳性能算法,线性判别分析(LDA),在物种水平上达到84.6%的精度,当物种进一步增加到> 95%时,当分类受到了已知的本地物种池时。模型性能与分类学特性负相关,LDA模型在亚家族水平上达到〜99%的准确性。当在较高的分类水平物种上分类不包括在训练数据集中的CARABID物种时,模型比随机进行分类明显更好。当使用较高的分类水平的单级分类方法相比,我们也观察到使用分层分类方法进行分类时的更大性能。这里概述的一般方法是使用机器学习算法进行分类陷阱陷阱收集的生物的概念证据,并且图像数据提取方法可以用于非手动学习用途。我们建议在大规模识别管道中的机器学习集成将提高效率并导致昆虫宏观生物数据的流动,有可能扩大与其他非关键征分类。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号